Stream: wasmtime

Topic: Fine-grained FIlesystem Access Policy


view this post on Zulip raskyld (May 25 2024 at 12:42):

Hi! Let's say you would like to permit filesystem access to a wasi:cli component, to do so, you preopens a directory which (with the current wasmtime-wasi implementation, IIUC) gives the component an handle backed by a cap_std::fs::Dir.

Now, let's say you need more control over the invocation made on this capability. In my specific use-case, I would like to deny access to a specific file beneath the preopened Dir(whether direct access or the result of symlink resolution(s)).
It seems like it is not possible to do something like that ATM, and I would need to re-implement filesystem host-functions to do so.

The WasiCtxBuilder::socker_addr_check() func allows the caller to implement advanced access policies through the usage of closures.

Do you think such a mechanism could be implementable / desirable (more likely in cap-fs-ext) for wasmtime? If not, do you have any suggestion on how to deal with the problem? :pray:

Thanks!

view this post on Zulip Robin Brown (May 26 2024 at 20:44):

@Dan Gohman we talked about this some a couple weeks ago in relation to wow. What do you think the best way to move forward with this kind of policy feature would be. Also, I think we'd want to make the file read-only specifically.

view this post on Zulip Ralph (May 27 2024 at 10:21):

that would be the preferred approach to reading in auth tokens for example....

view this post on Zulip Dan Gohman (May 28 2024 at 21:49):

We don't have anything that can do "deny acccess to a specific file" today.

view this post on Zulip Dan Gohman (May 28 2024 at 22:03):

In addition to symlinks and .. paths, there's also case sensitivity, Unicode normalization in some host filesystems, and special rules on the Wonderful World of Windows (example).

In this article, learn about file path formats on Windows systems, such as traditional DOS paths, DOS device paths, and universal naming convention (UNC) paths.

view this post on Zulip Dan Gohman (May 28 2024 at 22:09):

So in theory this might be doable, but I think it would be tricky to get right.

view this post on Zulip Ralph (May 29 2024 at 08:23):

Windows.... sigh

view this post on Zulip Robin Brown (May 29 2024 at 17:36):

It will be tricky for sure, the question to me is about at what layer tools should solve this.

As I see it there are 4 options

  1. At the OS level (e.g. have wow exec the command as a user who has read only permissions on the file)
  2. At the cap-std level (e.g. have a cap-std type that lets a policy trait be defined that controls this)
  3. At the wasmtime-wasi / bindings level (e.g. build in policy support like exists for sockets, but would be much more sophisticated, for files that supports this)
  4. At the guest level (e.g. build a component adapter that implements this policy on file-system)

What direction would you recommend @Dan Gohman ?

view this post on Zulip Dan Gohman (May 29 2024 at 18:08):

The OS level is probably going to be really inconvenient, because you have to have elevated privileges to create new users.

view this post on Zulip Dan Gohman (May 29 2024 at 18:19):

The guest level may be complex because if you don't have the ability to directly query what kind of host filesystem you're on, it may be tricky to know which path aliasing rules apply.

view this post on Zulip Dan Gohman (May 29 2024 at 18:28):

And any layer above the OS will need some way to detect directory symlinks. For example, in a path like a/b/c/d/e/f/g/h/secrets.txt, any of those path components could be a symlink, and there's no way to know without openating them one path component at a time (because O_NOFOLLOW only applies to the last component of a path string). So maybe the options look like:

view this post on Zulip Dan Gohman (May 29 2024 at 18:29):

And for any option, it would requiring finding someone who can speak authoritatively on all Windows issues, as I have chosen as a life goal to never learn about how Windows filesystem paths work in sufficient detail.

view this post on Zulip raskyld (May 29 2024 at 19:01):

Dan Gohman said:

And for any option, it would requiring finding someone who can speak authoritatively on all Windows issues, as I have chosen as a life goal to never learn about how Windows filesystem paths work in sufficient detail.

That's an healthy way of life :praise:

view this post on Zulip Robin Brown (May 29 2024 at 19:05):

For the moment, wow runs on any platform as long as it's unix. The next target I'm actually interested in are virtual workspaces (e.g. the browser). I'm honestly not sure if Windows support is even in the medium-term future.

view this post on Zulip raskyld (May 29 2024 at 19:14):

What I was thinking about is adding a hook, there: https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasi/src/host/filesystem.rs#L591

That is, post path-resolution when we have a std File resolved and pass it to the user provided policy (which could be a closure or a trait object) as an argument. Then, let it returns a decision (allow / deny), that way, we benefit from the fact File is almost platform-agnostic to make the policies agnostic aswell.

Of course, in the future, if we want to even avoid the resolution from happening we can have another hook, something like:
pre_open_at and post_open_at.

What I don't know, is whether it should be done in wasmtime only, or if it should be implemented as a cap-std OpenOptions or somethings that would be "attached" to the capability handle (i.e. cap_std::fs::Dir).

WDYT?

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Ralph (May 29 2024 at 20:13):

fortunately, there is a company that knows something you've chosen not to know about Windows.

view this post on Zulip Ralph (May 29 2024 at 20:13):

and it is not me, either. phew

view this post on Zulip Dan Gohman (May 29 2024 at 21:29):

@raskyld You likely also need to prevent users from using a rename to replace the contents of one file with another, and rename operates directly on paths, so there is no open and no std File gets created.

view this post on Zulip raskyld (May 30 2024 at 18:05):

Dan Gohman said:

raskyld You likely also need to prevent users from using a rename to replace the contents of one file with another, and rename operates directly on paths, so there is no open and no std File gets created.

You are right!
I think it all comes down to what are the attributes we want the policies to operate on.
The paths are the most "straight-forward" one indeed, but also the most complex to get right due to aliasing rules and the fact each platform resolves paths differently. This would make writing correct policies error-prone and challenging..

I think the path should definitely be passed to the policy engine (more likely a closure expect if you have another idea? a CEL interpreter? https://github.com/google/cel-spec) but we may also consider passing Metadata which would allow the users to use Unixes' inodes or Windows Volume Serial + File Index to identify the files they want to protect.

EDIT: If I am correct, cap-std calls stat on both paths during a rename.

Common Expression Language -- specification and binary representation - google/cel-spec

view this post on Zulip Dan Gohman (May 30 2024 at 18:37):

Even inode numbers come with some subtle concerns. Something as innocuous as editing a config file in vim changes the config file's inode number. And on Windows there are caveats like "In some cases, the file ID for a file can change over time." (source)

Contains information that the GetFileInformationByHandle function retrieves.

view this post on Zulip Dan Gohman (May 30 2024 at 19:02):

It turns out that cap-std does not call stat on either path in a rename.

view this post on Zulip raskyld (May 30 2024 at 19:23):

Dan Gohman said:

It turns out that cap-std does not call stat on either path in a rename.

Sorry, I should have linked source as well: https://github.com/bytecodealliance/cap-std/blob/main/cap-primitives/src/fs/rename.rs#L30

It seemed to me that it does stat but I may misunderstand the linked code.

Dan Gohman said:

Even inode numbers come with some subtle concerns. Something as innocuous as editing a config file in vim changes the config file's inode number. And on Windows there are caveats like "In some cases, the file ID for a file can change over time." (source)

So the only real "secure" way would be to resolve the file you want to protect, then resolve the path requested by the guest and compare their identifier to see if they resolve to the same. You couldn't just store the identifier at the start of the process because we have no guarantee they stay stable in time? :tear:

For example, for wow, it is paramount to protect ./wow.kdl, so each hook invocation should resolve ./wow.kdl, then resolve the path requested by the guest (e.g. wow.kdl.symlink which point to the same inode than ./wow.kdl) and compare their identifier to deliver a decision (allow/deny).

At the end of the day, that just means it is the responsibility of the embedder to write correct policies.. but platform behaviour makes that kind of complicated :sweat_smile:

Capability-oriented version of the Rust standard library - bytecodealliance/cap-std

view this post on Zulip raskyld (May 30 2024 at 19:27):

At the end of the day, we could also check whether we can write abstractions to help the embedders write correct policies, but trying to write platform agnostic helpers to deal with file authentication may leads us to learn.microsoft.com, and so, to insanity. :upside_down:

Gain technical skills through documentation and training, earn certifications and connect with the community

view this post on Zulip bjorn3 (May 30 2024 at 19:42):

Can you move wow.kdl out of the directory that should be writable by the wasm module? For example to it's parent directory/move all writable files to a subdirectory.

view this post on Zulip Robin Brown (May 30 2024 at 19:47):

the idea is that the wow.kdl defines the workspace in much the same way as a .python-version (pyenv), package.json, etc. do and makes tools available in that same directory so that you can clone it, install, and start using them right there. I don't think it's realistic to ask people to add a layer of nesting to their repos for this.

view this post on Zulip bjorn3 (May 30 2024 at 20:05):

Is the set of files in the root of the workspace expected to stay constant for the duration of the wasi program or is the wasi program expected to add/remove files from the workspace? If the former you could enumerate all files and directories at the root at startup and individually give the wasi program permission to access them all with the exception of wow.kdl.

view this post on Zulip Robin Brown (May 30 2024 at 20:14):

It is expected to add files (e.g. a compiler) and theoretically move/remove files (though I don't have examples in mind).

view this post on Zulip bjorn3 (May 30 2024 at 20:15):

Would a compiler place them in a subdirectory? If so giving full access to this subdirectory should be fine.

view this post on Zulip Lann Martin (May 30 2024 at 20:22):

and just to make it explicit in case anyone thinks these concerns are too pessimistic/picky: there have been - and continue to be - many vulnerabilities due to filesystem races and file identity confusion

view this post on Zulip Robin Brown (May 30 2024 at 20:24):

I would want it to be possible for someone to run my-compiler -i foo.my -o foo.wasm if they wanted to.

That said, I could see doing something clever like giving them a different directory (temp OS dir or virtual) that contains all the files that were in the workspace (via symlinks or references of some kind) except for wow.kdl. Then at the end of the execution copy over any new files into the workspace.

Alternatively, we could go all in on separate source and target areas (though it makes things like formatters impossible) or focus wow on completely virtual workspaces where we can make the file-system behave nicely and be completely portable (maybe with a way to sync things between the real and virtual file-systems?).

Workspaces on Wasm. Contribute to esoterra/wow development by creating an account on GitHub.

view this post on Zulip Robin Brown (May 30 2024 at 20:26):

In the virtual option, maybe you'd have to drop into a virtual shell and when you do it copies the state of the workspace into a virtual file system. Then you run your wow commands against this and somewhere (maybe as you go maybe at the end?) you apply changes back to the real file-system/repo.

view this post on Zulip Lann Martin (May 30 2024 at 22:13):

It seems like it should be possible to have a "hybrid virtual filesystem" where directories (entries) are virtualized but files aren't. I think that would serve a lot of purposes like these.

view this post on Zulip Lann Martin (May 30 2024 at 22:17):

It would certainly be handy for some code I maintain where I currently copy (or hardlink, where possible) files out of a content-addressed dir into a tempdir just to reconstruct the file tree for wasi

view this post on Zulip Robin Brown (May 31 2024 at 04:20):

I wonder if that could even be done lazily as guests attempt to access paths

view this post on Zulip raskyld (May 31 2024 at 07:21):

I have been watching this issue: https://github.com/bytecodealliance/cap-std/issues/352

I think it would serve our purpose to be able to create a Dir that is not a direct handle to a physical directory but to a virtual mapping defined programatically by the embedder. This would allow us to map the root workspace dir 1:1 with the exception that wow.kdl wouldn't be resolvable.

Specifically with the option for those files/dirs to just be paths that don’t need to be opened when mounted. Note: When I say “mount” here this would only be internal to the Dir struct and not act...

view this post on Zulip raskyld (Jun 01 2024 at 09:35):

Hi all! Week-end is there finally :relieved:

@Dan Gohman I've seen on the linked issue that you have some concerns about implementing such virtual Dir in cap_std. Which is understandable because it would break the current API. If we were to create a new crate for that, do you have an idea how much work that would represent on wasmtime-wasi-crate?

view this post on Zulip Dan Gohman (Jun 04 2024 at 14:39):

@raskyld (the stat calls here are conditional with #[cfg(racy_asserts)], which is an option used in testing and fuzzing but not enabled in normal builds)

Capability-oriented version of the Rust standard library - bytecodealliance/cap-std

view this post on Zulip Dan Gohman (Jun 04 2024 at 14:39):

It sounded like there might be a way to use https://learn.microsoft.com/en-us/windows/win32/devnotes/ntopendirectoryobject to obtain a handle representing a unified root containing all the Windows drive letters. If that's what you want, and if that turns out to be feasible, then it should be straightforward to implement within cap-std and have everything else "just work".

view this post on Zulip raskyld (Jun 05 2024 at 07:30):

I was quoting the issue but what I was thinking about was more general: In cap-std, a Dir directly map to a std file.
What about a feature allowing wasmtime user to build a virtual dir to back WASI preopens that doesn't map to a file directly. It would be a virtual tree where nodes are mapping to either another virtual tree or a real host's fd/handle.

In the case of wow, we would just need to map the virtual dir to the real host's dir with a filter that avoid mapping the wow.kdl at all. I don't have all the implementation details, but I am looking for a path to start trying to find a solution to the problem.

view this post on Zulip raskyld (Jun 14 2024 at 06:51):

Hey all, small :up: , what do you think about this approach?

view this post on Zulip Dan Gohman (Jun 14 2024 at 18:53):

I'm not sure. We've tended to avoid having elaborate configurable filesystem APIs at the Wasmtime host API level, because a lot of use cases should be better off implemented by virtualization via guest code. But maybe some of these hybrid schemes point to a use case that can be more effectively done in host code, so perhaps we should consider it.


Last updated: Nov 22 2024 at 16:03 UTC