Stream: general

Topic: Determinism/Reproducibility


view this post on Zulip indolering (Oct 23 2020 at 22:53):

So Dan and I were discussing deterministic/reproducible behavior in WASI.

view this post on Zulip indolering (Oct 23 2020 at 22:54):

One thing that came up is sort order, which is a cause of non-determinism in a lot of languages.

view this post on Zulip Dan Gohman (Oct 23 2020 at 22:57):

I don't know of a any prior art in this space, so all I have right now are ideas off the top of my head

view this post on Zulip indolering (Oct 23 2020 at 22:57):

I had suggested earlier that some sort of way to control deterministic/reproducible behavior, such as defining the sorting of files based on name (as opposed to timestamps). This would be slow, especially without the help of the underlying filesystem/b-tree structure.

view this post on Zulip Dan Gohman (Oct 23 2020 at 22:57):

One option would be to have the implementation maintain its own index, that it'd update every time it creates or renames a file.

view this post on Zulip indolering (Oct 23 2020 at 22:58):

But have that be configurable, so you can turn that one but it will be slow and not all runtimes will support it.

view this post on Zulip Dan Gohman (Oct 23 2020 at 22:58):

Then, fd_readdir etc. would iterate over that index, rather than over thee host directory.

view this post on Zulip Dan Gohman (Oct 23 2020 at 22:58):

Right, slow, and it'd get out of date if other processes can access the dir.

view this post on Zulip Dan Gohman (Oct 23 2020 at 22:59):

So yeah, we'd want it to be optional

view this post on Zulip indolering (Oct 23 2020 at 23:02):

There are FUSE filesystems, Android used wrapfs to enable case-folded file lookup.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:02):

I wonder if the problem of other processes could be solved, or at least mitigated, by having the implementation compare the timestamp of the index to the timestamp of the directory. If the directory is more recently modified, re-scan the directory and regenerate the index.

view this post on Zulip indolering (Oct 23 2020 at 23:02):

And disorderfs, which can actually list directories in a specified sort order.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:03):

Ah, FUSE etc. is a good idea too. If we can make the host FS do what we need, that'd avoid the need for an external index

view this post on Zulip indolering (Oct 23 2020 at 23:03):

In my research on case-folded lookups on case-sensitive file systems, all of the solutions were racy.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:05):

Maybe that's the first question then. Are there any non-racy solutions?

view this post on Zulip indolering (Oct 23 2020 at 23:06):

I don't know.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:06):

Or perhaps we just need to say, don't enable this option if anyone can access the FS concurrently?

view this post on Zulip indolering (Oct 23 2020 at 23:07):

I mean, SQLITE can be as fast as filesystem access. So a virtual filesystem could work.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:09):

If that meets your needs, it sounds like a reasonable thing to do to me

view this post on Zulip indolering (Oct 23 2020 at 23:09):

BeFS allowed arbitrary metadata, but no one seems eager to emulate that FS.

view this post on Zulip indolering (Oct 23 2020 at 23:09):

OS X, Windows, etc all just maintain a file-based search index.

view this post on Zulip indolering (Oct 23 2020 at 23:10):

But again, I think it would be possible to define a well specified standard but allow non-deterministic access that relies on the underlying FS.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:12):

Could you describe your goal here in more detail?

view this post on Zulip indolering (Oct 23 2020 at 23:12):

If you can assume an exclusive lock on the directory, then you should be able to enable case-folded lookups on that directory in Linux. This is currently only working on EXT4 and F2FS but those efforts are very much intended to spread to the other Linux filesystems.

view this post on Zulip indolering (Oct 23 2020 at 23:13):

The filesystem is a global shared state, it helps if you can nail down its behavior as finely as possible.

view this post on Zulip indolering (Oct 23 2020 at 23:14):

My other big concern from earlier to preventing a standard which adopts case-sensitivity by default, as it's very hard to undo that choice later.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:14):

Does Linux have a way to lock directories?

view this post on Zulip indolering (Oct 23 2020 at 23:14):

As a usability engineer, we are often brought in waaaaay to late to make any changes.

view this post on Zulip indolering (Oct 23 2020 at 23:15):

I mean, through permissions. But ~/.wine essentially assumes exclusive access and enables case-insensitivity on that directory.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:17):

If you have an implementation in mind, could you write up some pseudo-code for how, eg. creating a file, and listing the contents of a directory, would work?

view this post on Zulip indolering (Oct 23 2020 at 23:19):

I don't, the simplest would be to create a sort order based on unicode codepoints. But that is a basic sort.

view this post on Zulip indolering (Oct 23 2020 at 23:20):

AFAICT, EXT4, BTRFS, and F2FS all maintain a hashed filename index. So it won't be sorted according to anything close to Unicode sort order, but still sorted.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:20):

So picking a specific ordering is one thing, yes, but I'm still trying to get a picture for what problem you're looking to solve, or which specific context you're looking to solve it in.

view this post on Zulip indolering (Oct 23 2020 at 23:22):

The specific problem I'm looking to solve is serving up files from the FS in an order based on the filename, as opposed to insertion order in the b-tree.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:23):

Cool, and are you hoping to enable this for general-purpose use, or only within virtual filesystems and perhaps private-directory filesystems where we have exclusive access?

view this post on Zulip indolering (Oct 23 2020 at 23:23):

Disorderfs is actually used by the reproducible build community to debug builds where the FS ordering of files creeps into build scripts.

view this post on Zulip indolering (Oct 23 2020 at 23:23):

The last two.

view this post on Zulip indolering (Oct 23 2020 at 23:24):

We could do general purpose, but I would just stub that code out and document how you would like it done.

view this post on Zulip indolering (Oct 23 2020 at 23:24):

Listing directories specific directories based on filename in a deterministic fashion is totally doable.

view this post on Zulip indolering (Oct 23 2020 at 23:25):

Emulating case-insensitivity on a case-sensitive filesystem can work well enough in 80

view this post on Zulip indolering (Oct 23 2020 at 23:25):

% of cases.

view this post on Zulip indolering (Oct 23 2020 at 23:27):

But it's always going to be slower, racy, and blow up in your face if you do something dumb like "/foo/BAR/readme.txt" and "/foo/bar/readme.txt"

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:28):

I imagine for virtual-fs and private-directory cases we can case-fold the host files, so it'd work

view this post on Zulip indolering (Oct 23 2020 at 23:28):

Perhaps we should discuss filename, encoding, and case-sensitivity and to what degree the API should enforce its opinions on everyone?

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:28):

Are you venturing into the general-purpose side of things now?

view this post on Zulip indolering (Oct 23 2020 at 23:28):

Yeah.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:30):

My working assumption is that case-insensitivity is just "whether and how it's done is nondeterministic"

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:30):

in the general-purpose case

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:30):

and we just let whatever the host FS does shine through

view this post on Zulip indolering (Oct 23 2020 at 23:31):

I don't think we can impose our invariants on the underlying filesystem without things breaking.

view this post on Zulip indolering (Oct 23 2020 at 23:31):

So agree there.

view this post on Zulip indolering (Oct 23 2020 at 23:31):

But I strongly disagree that we should fail if (as was suggested) the case differs from what was looked up.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:32):

Ah, interesting.

view this post on Zulip indolering (Oct 23 2020 at 23:32):

You are basically making the API opinionated in favor of case-sensitivity, which is not what end-users/consumers want.

view this post on Zulip indolering (Oct 23 2020 at 23:33):

And it's a nightmare to try and fix that choice later.

view this post on Zulip indolering (Oct 23 2020 at 23:33):

Have you seen sandboxfs?

view this post on Zulip indolering (Oct 23 2020 at 23:33):

Actually, let's shelf that for a minute.

view this post on Zulip indolering (Oct 23 2020 at 23:33):

shelve*

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:36):

That "check to see if the case differs" is meant to avoid programs which accidentally depend on running on case-insensitive filesystems

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:37):

Your point seems to be "case insensitive filesystems are The Path Forward", so we should embrace them and not risk locking ourselves out of an all-case-insensitive future

view this post on Zulip indolering (Oct 23 2020 at 23:38):

primarily-case-insensitive future.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:38):

I hadn't though of it like that

view this post on Zulip indolering (Oct 23 2020 at 23:38):

But yeah, mostly.

view this post on Zulip indolering (Oct 23 2020 at 23:38):

I mean, people have scripts that work on OS X but fail on Linux because case suddenly matters.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:40):

What we might do, is have that check, but don't make it part of the WASI spec. Just make it a debugging feature that engines can optionally provide.

view this post on Zulip indolering (Oct 23 2020 at 23:40):

What confuses me is that you are proposing normalizing all names to UTF-8, which is similar to case-folding in that the string used to lookup a file in a UCS-2 the language won't match the string returned as the filename.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:41):

assuming we do something like ARF strings or modified UTF8-C8, the differences is that the translation is reversible

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:41):

case-folding is lossy

view this post on Zulip indolering (Oct 23 2020 at 23:41):

That's still pushing people to use case, which shouldn't matter.

view this post on Zulip indolering (Oct 23 2020 at 23:42):

For sure, but it will still require manual refactoring.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:42):

The theory with ARF strings is that non-Unicodeill-formed filenames are so rare that many programs wouldn't need to bother

view this post on Zulip indolering (Oct 23 2020 at 23:43):

Also, if you are going to convert to UTF-8, are you also going to normalize to NFC?

view this post on Zulip indolering (Oct 23 2020 at 23:43):

Agreed.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:43):

No, I expect NFC/NFD is just "whatever the host does"

view this post on Zulip indolering (Oct 23 2020 at 23:45):

Hrm.

view this post on Zulip indolering (Oct 23 2020 at 23:46):

This is a tad above my paygrade: my understanding is that if you are going to mess with encoding, then you should probably do NFC too. <- might be wrong.

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:46):

arbitrary codepoint sequence -> NFC or NFD is also lossy

view this post on Zulip indolering (Oct 23 2020 at 23:47):

Basically, you have to do NFC or NFD if you want deterministic sort order.

view this post on Zulip indolering (Oct 23 2020 at 23:48):

NFD is faster, but doesn't match what most keyboards and applications do, so it makes it hard to find the file in the filesystem. But since you don't care about re-encoding the file name....

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:49):

in the general-purpose case, we can't prevent other processes from creating non-NFC or non-NFD names on hosts which don't enforce those

view this post on Zulip indolering (Oct 23 2020 at 23:49):

Linus hates this, it was seen as a mistake for HFS+ to do this, and that all filesystems should preserve encoding/case as the filename returned but store an NFD/NFC normalized filename in the metadata.

view this post on Zulip indolering (Oct 23 2020 at 23:50):

Okay, I actually agree we should go that way.

view this post on Zulip indolering (Oct 23 2020 at 23:51):

Alright, so I will post my concerns about blowing up when there is a case mismatch in that filename thread.

view this post on Zulip indolering (Oct 23 2020 at 23:51):

And post info about performance in the Gist I made.

view this post on Zulip indolering (Oct 23 2020 at 23:51):

Uhh, last thing I wanted to discuss: using FUSE/encryption to enforce access control and determinism.

view this post on Zulip indolering (Oct 23 2020 at 23:52):

Bazel created sandboxfs too speedup access-control permissions in builds.

view this post on Zulip indolering (Oct 23 2020 at 23:53):

I believe Apple uses encryption to enforce access control too, IIRC it gets as finely grained as append only writes.

view this post on Zulip indolering (Oct 23 2020 at 23:54):

That would also help with determinism.

view this post on Zulip indolering (Oct 23 2020 at 23:54):

Do you know of any literature that defines the limits of what encryption based ocap can do?

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:56):

(to address your comment above, normalizing to NFC is worth considering, but in the general-purpose case we similarly run into O(2^N) situations where you have to check for all possible variations of non-normalized filenames created by other processes)

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:56):

At the level we're operating at right now, all encryption is "the next level down"

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:57):

Bits on disk are encrypted by the host's filesystem code, but that's all transparent outside of the kernel

view this post on Zulip Dan Gohman (Oct 23 2020 at 23:59):

userspace never sees the encrypted bits, and the only security function the encryption serves is to foil attackers that can access the underlying storage media directly, which userspace can't do under normal circumstances

view this post on Zulip Dan Gohman (Oct 24 2020 at 00:01):

capability-based security is about controlling what things you can ask for. The basic idea is that you're given handles, which are program values that you can pass around as arguments and return values and so on, that represent resources you have access to, instead of naming things with strings

view this post on Zulip Dan Gohman (Oct 24 2020 at 00:03):

open("foo", O_RDONLY) plucks the name foo from thin air and requests access to it. This implies a global namespace in which the request can be resolved, which might have ACLs to govern access, but ACLs are awkward to manage in a bunch of ways.

view this post on Zulip indolering (Oct 24 2020 at 00:05):

Yeah, I get that. But I believe the original E language had some crypto capabilities built into it.

view this post on Zulip Dan Gohman (Oct 24 2020 at 00:06):

There is some research into using handles across networks.

view this post on Zulip indolering (Oct 24 2020 at 00:06):

And there are some researchers who would like to reduce the security of a lot of systems down to cryptographic primitives.

view this post on Zulip indolering (Oct 24 2020 at 00:06):

But out of scope, clearly.

view this post on Zulip Dan Gohman (Oct 24 2020 at 00:09):

The key property of handles is that you can't "forge" them, or obtain them without being explicitly given them. Within a single host, that's straightforward, but if you want to pass a handle to another host on the network, and they might pass it on to someone else, how do you ensure that noone can forge such a handle?

view this post on Zulip Dan Gohman (Oct 24 2020 at 00:12):

I don't have links handy, but there has been research along those lines

view this post on Zulip indolering (Oct 24 2020 at 00:13):

Yeah.

view this post on Zulip indolering (Oct 24 2020 at 00:14):

The only comprehensive survey of capabilities I've seen are by the CHERI people, and they have formal models tying their capabilities to memory tagged hardware. I'll go ask them about alternative modes of enforcement.

view this post on Zulip indolering (Oct 24 2020 at 00:14):

Okay, thanks dan!

view this post on Zulip Dan Gohman (Oct 24 2020 at 00:15):

yw!

view this post on Zulip indolering (Oct 24 2020 at 00:30):

Oh, Dan, it's not O(N^filename) runtime.

view this post on Zulip indolering (Oct 24 2020 at 00:30):

It's O(log files)

view this post on Zulip indolering (Oct 24 2020 at 00:31):

As you just get a list of all files in a directory and convert their names to the canonical casefolded name.

view this post on Zulip indolering (Oct 24 2020 at 00:31):

And do a compare.

view this post on Zulip indolering (Oct 24 2020 at 00:34):

You would only get exponential behavior if someone created a lot of directories with lots of cases, such as /foo/bar/baz/..., /Foo/bar/baz/..., /FOo/bar/baz.

view this post on Zulip indolering (Oct 24 2020 at 00:36):

Even then, if your goal is to create a lint, you would could just error when there are no exact case-sensitive matches but multiple case insensitive matches.

view this post on Zulip Jubilee (Oct 28 2020 at 04:16):

I am mildly perplexed by "primarily case insensitive future" because Mac and Windows both introduced the ability to make their file systems case sensitive, and I don't think it's any less troublesome going from case insensitive to case sensitive here.

view this post on Zulip Dan Gohman (Oct 28 2020 at 21:54):

@indolering If I need to create N files (suppose I'm unpacking an archive, compiling lots of source files to object files, etc.), and each time I create a file I have to scan the directory to see if there's a file with a case-folding-equivalent name, it goes O(N^2) in the number of files I'm creating

view this post on Zulip Dan Gohman (Oct 28 2020 at 22:19):

@Jubilee My working ssumption is that WASI will end up saying that host fileystem directories can be either case-sensitive or insensitive, and we just expose that to applications as-is.

view this post on Zulip Dan Gohman (Oct 28 2020 at 22:19):

Applications will just need to avoid depending on either case sensitivity or case insensitivity if they want to be portable.

view this post on Zulip Dan Gohman (Oct 28 2020 at 22:22):

There may be some debugging facilities we can add to help applications catch mistakes, and I think the observation above is, we shouldn't make those features mandatory, because if filesystems do end up converging on case-insensitive, we don't want to be stuck with those debugging features forever.

view this post on Zulip Jubilee (Oct 30 2020 at 22:11):

Mmm, that's reasonable I guess?! I will concede I myself genuinely don't see a reason that file systems should be case insensitive from a data perspective, as it makes it possible to trust an address that is not bit-equal to another address is in fact not the same address (modulo some canonicalization), and user-space can support case-insensitive comparisons where it makes sense just fine. But here I am extending more of a "would-write-an-FS perspective" than a "would-like-to-solve-platform-compatibility-issues" perspective, and so venturing a bit further afield from The Point. Aside from "everyone should clearly adopt my perspective and then there would be no more compatibility issues" which I recognize is in actuality a non-starter. :^)

view this post on Zulip indolering (Nov 01 2020 at 22:21):

@Jubilee 99% of users on an OS with a filesystem that is case-insensitive by default.

view this post on Zulip indolering (Nov 01 2020 at 22:31):

@Dan Gohman I think there is a miscommunication WRT operating modes and error handling.

view this post on Zulip Jacob Lifshay (Nov 01 2020 at 22:32):

I heard somewhere that, among developers, it's much more evenly split between Linux, Windows, and macOS (about 1/3 for each). Nearly all Linux OSes are case-sensitive by default.

view this post on Zulip indolering (Nov 01 2020 at 22:36):

@Jacob Lifshay True, lots of server-side stuff is going to run on case-sensitive filesystems by default. That being said, at least on Stack Overflow, Linux devs are out-numbered 3:1.

view this post on Zulip indolering (Nov 01 2020 at 22:36):

https://insights.stackoverflow.com/survey/2020#technology-developers-primary-operating-systems

Nearly 65,000 took this comprehensive, annual survey of people who code. Demographics. Most loved, dreaded and wanted technologies. Salary and careers.

view this post on Zulip indolering (Nov 01 2020 at 22:37):

I also think that for all of the wailing and gnashing of teeth, distros will eventually switch to case-insensitive by default. At least for home directories.

view this post on Zulip indolering (Nov 01 2020 at 22:38):

That being said, I believe that virtually every filesystem allows for setting this behavior on a per-directory basis.

view this post on Zulip indolering (Nov 01 2020 at 22:41):

My main concern was that the tickets I read were suggesting fail fast on case-insensitive filesystems. That's a mistake.

view this post on Zulip indolering (Nov 01 2020 at 22:45):

@Jubilee The web is case-insensitive: a domain name is basically a pointer to a server and paths are basically case-insensitive lookups on a filesystem.

view this post on Zulip indolering (Nov 01 2020 at 22:47):

From a security and reproducibility perspective, you want to minimize the impedance mismatch between the client and the host.

view this post on Zulip indolering (Nov 01 2020 at 22:51):

From a data-perspective, we definitely want valid UTF-8 right?

view this post on Zulip indolering (Nov 01 2020 at 22:52):

@Jubilee Sorry, I'll stop hammering you. I'm writing another ticket right now and my thought streams are crossing :P

view this post on Zulip indolering (Nov 01 2020 at 22:53):

image.png <img src="http://alinken.people.ua.edu/uploads/8/7/9/2/87929690/published/ghostbusters.jpg?1501582414" alt="Picture"/>

view this post on Zulip Jubilee (Nov 01 2020 at 23:01):

@indolering That is not true anymore, and has not been for a long time.
iOS and Android OS use primarily case sensitive FS while supporting operations on case-insensitive FS for back-compat and with many applications exposing case-insensitive functionality.
i.e. my preferred scheme.
As far as the web goes, not all servers implement those accesses as case insensitive, and of those which do, they commonly involve a redirect to a canonical version. It took... 3 tries? to find a case-sensitive access that fails without a redirect.

view this post on Zulip Jubilee (Nov 01 2020 at 23:02):

( the first hardly counts, since I was just doublechecking against my memory of domain names being case insensitive. )

view this post on Zulip Jacob Lifshay (Nov 01 2020 at 23:07):

indolering said:

The web is case-insensitive: a domain name is basically a pointer to a server and paths are basically case-insensitive lookups on a filesystem.

DNS is case insensitive (but only for ASCII I think -- IIRC punycode doesn't do any case folding before encoding). Most servers run Linux with case-sensitive filesystems, so I'd expect the other parts of a url to be usually case sensitive.

view this post on Zulip Jubilee (Nov 01 2020 at 23:16):

From a security perspective, the fact that I can write the link https://googIe.com and it does not go to the same place as https://google.com is a source of endless phishing tech, so no, you may consider me suitably skeptical that even exposing case insensitivity to a user serves their security that much.

And from a data perspective an address can be raw bytes for all I care (and still must, because not all OS enforce UTF-8 path validity!). File systems should cast their eye to living a life longer than an encoding scheme. It's, again, userspace's job to make it intelligible in my opinion. Sometimes an impedance mismatch is simply why we have software.

view this post on Zulip Dan Gohman (Nov 02 2020 at 00:36):

@Jubilee FWIW, WASI-filesystem is expected to use UTF-8 paths. The "filenames are just bytes" strategy was practical in its day, but with UTF-8 the practicalities line up very differently.

view this post on Zulip indolering (Nov 02 2020 at 00:36):

I stand corrected: path handling is case preserving and server/filesystem dependent, but Linux of course defaults to case sensitive matching. I guess I was in DNS land for too long!

view this post on Zulip indolering (Nov 02 2020 at 00:48):

I mean, from the perspective of usability (carrying out end-user intent) and enforcing a single namespace in the filesystem, you want the lowest common denominator normalization (NFKD casefold) so that an attacker can't do something like store a ligature ℀ which some server or client could NFKD into a filename path.

view this post on Zulip indolering (Nov 02 2020 at 00:50):

And the only way to be sure you don't accidentally allow two unicode strings that normalize into some other unicode string is at the filesystem level.

view this post on Zulip indolering (Nov 02 2020 at 00:51):

But yeah, I also don't want some firewall allowlist filter to be bypassed because filename lookups are case/normalization insensitive.

view this post on Zulip Jubilee (Nov 02 2020 at 01:29):

Oh yes, I expect WASI to use UTF-8 because that is more sensible for WASI's purposes than letting someone decide the next UTF-16 is a good idea.

view this post on Zulip Jubilee (Nov 02 2020 at 01:46):

But that already involves hitting a translation layer between existing systems and WASI, from my perspective, and at that point where the translation layer exists is subject to some negotiation (since it is, as it were, already a negotiation). But it's slightly more audacious to reformat a user's hard drive than to replace the user's kernel, which is what drives my sentiments regarding where such canonicalization "belongs".

view this post on Zulip Dan Gohman (Nov 02 2020 at 02:19):

I like how you put it -- a "negotiation" describes it well.

view this post on Zulip indolering (Nov 02 2020 at 02:49):

Agreed! Determinism can only be enforced if WASI can assume exclusive control at the FS level (akin to WINE setting case-insensitive behavior on ~/.wine/).

view this post on Zulip indolering (Nov 02 2020 at 02:57):

And a runtime can chose a fast/sloppy implementation that relies on whatever the filesystem does, which should work with 95% of real-world filenames. If you want a truly deterministic filesystem, then you probably want to do something with FUSE or VFS (I suspect some Unicode compression/transliteration scheme could fit 99% of real-world filenames into the Posix portable filename subset, which is available on virtually every platform).

view this post on Zulip indolering (Nov 03 2020 at 22:00):

Just checked a iPod touch and the filesystem UI won't let me create two folders that differ in case. So for users iOS is case-insensitive. As iCloud enforces case-insensitivity, I would be surprised if Apple doesn't switch iOS to case-insensitive behavior as well.

view this post on Zulip indolering (Nov 03 2020 at 22:26):

But maybe the API should error out by default if the filename isn't an exact match, just as long as WASI can adapt to the underlying filestore in an ergonomic fashion.

view this post on Zulip indolering (Nov 03 2020 at 22:58):

If we have exclusive access to a directory (akin to Android, iOS, Flatpak, etc) then I would default to the lowest-common-denominator, so that everything "just works" regardless of the case or normalization conventions of the FS. But I need to do a review of all the issues with IDNA, Stringprep, and the i18n filesystem RFCs first.

view this post on Zulip indolering (Nov 04 2020 at 20:12):

Does anyone have experience with/thoughts on PRECIS? It's the IETF followup to the Stringprep algorithm.

view this post on Zulip Dan Gohman (Nov 05 2020 at 00:11):

Do you know if there are any filesystems which do case-sensitive lookups, but still prevent creating files that differ only in case?

view this post on Zulip indolering (Nov 05 2020 at 01:49):

Not off the top of my head. However, if a file has a non-normalized name, Linux falls back to using the bytestring as an opaque identifier.

view this post on Zulip indolering (Nov 05 2020 at 01:51):

FWIW, I am planning on documenting this behavior (with functional testing) in a git repo at some point.

view this post on Zulip indolering (Nov 07 2020 at 00:45):

I need to nail down the behavior of how filenames are handled across platforms and I'm at the point where I need to start testing. I don't know if this work will be upstreamed, but is there an infrastructure preference WRT functional testing and virtual machines?


Last updated: Jan 24 2025 at 00:11 UTC