Hi! Let's say you would like to permit filesystem access to a wasi:cli
component, to do so, you preopens
a directory which (with the current wasmtime-wasi
implementation, IIUC) gives the component an handle backed by a cap_std::fs::Dir
.
Now, let's say you need more control over the invocation made on this capability. In my specific use-case, I would like to deny access to a specific file beneath the preopened Dir
(whether direct access or the result of symlink resolution(s)).
It seems like it is not possible to do something like that ATM, and I would need to re-implement filesystem
host-functions to do so.
The WasiCtxBuilder::socker_addr_check()
func allows the caller to implement advanced access policies through the usage of closures.
Do you think such a mechanism could be implementable / desirable (more likely in cap-fs-ext
) for wasmtime? If not, do you have any suggestion on how to deal with the problem? :pray:
Thanks!
@Dan Gohman we talked about this some a couple weeks ago in relation to wow
. What do you think the best way to move forward with this kind of policy feature would be. Also, I think we'd want to make the file read-only specifically.
that would be the preferred approach to reading in auth tokens for example....
We don't have anything that can do "deny acccess to a specific file" today.
In addition to symlinks and ..
paths, there's also case sensitivity, Unicode normalization in some host filesystems, and special rules on the Wonderful World of Windows (example).
So in theory this might be doable, but I think it would be tricky to get right.
Windows.... sigh
It will be tricky for sure, the question to me is about at what layer tools should solve this.
As I see it there are 4 options
wow
exec the command as a user who has read only permissions on the file)What direction would you recommend @Dan Gohman ?
The OS level is probably going to be really inconvenient, because you have to have elevated privileges to create new users.
The guest level may be complex because if you don't have the ability to directly query what kind of host filesystem you're on, it may be tricky to know which path aliasing rules apply.
And any layer above the OS will need some way to detect directory symlinks. For example, in a path like a/b/c/d/e/f/g/h/secrets.txt
, any of those path components could be a symlink, and there's no way to know without openat
ing them one path component at a time (because O_NOFOLLOW
only applies to the last component of a path string). So maybe the options look like:
File
would to change from being a trivial wrapper around a std::fs::File
to also having a list of prohibited paths, and we'd need to have some form of Unicode dependency in order to do case conversion and Unicode normalization. Maybe it could be made an optional feature, to avoid encumbering cap-std's other users?openat
-one-at-a-time logic.And for any option, it would requiring finding someone who can speak authoritatively on all Windows issues, as I have chosen as a life goal to never learn about how Windows filesystem paths work in sufficient detail.
Dan Gohman said:
And for any option, it would requiring finding someone who can speak authoritatively on all Windows issues, as I have chosen as a life goal to never learn about how Windows filesystem paths work in sufficient detail.
That's an healthy way of life :praise:
For the moment, wow
runs on any platform as long as it's unix
. The next target I'm actually interested in are virtual workspaces (e.g. the browser). I'm honestly not sure if Windows support is even in the medium-term future.
What I was thinking about is adding a hook, there: https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasi/src/host/filesystem.rs#L591
That is, post path-resolution when we have a std File
resolved and pass it to the user provided policy (which could be a closure or a trait object) as an argument. Then, let it returns a decision (allow / deny), that way, we benefit from the fact File
is almost platform-agnostic to make the policies agnostic aswell.
Of course, in the future, if we want to even avoid the resolution from happening we can have another hook, something like:
pre_open_at
and post_open_at
.
What I don't know, is whether it should be done in wasmtime
only, or if it should be implemented as a cap-std
OpenOptions or somethings that would be "attached" to the capability handle (i.e. cap_std::fs::Dir
).
WDYT?
fortunately, there is a company that knows something you've chosen not to know about Windows.
and it is not me, either. phew
@raskyld You likely also need to prevent users from using a rename to replace the contents of one file with another, and rename operates directly on paths, so there is no open
and no std File
gets created.
Dan Gohman said:
raskyld You likely also need to prevent users from using a rename to replace the contents of one file with another, and rename operates directly on paths, so there is no
open
and no stdFile
gets created.
You are right!
I think it all comes down to what are the attributes we want the policies to operate on.
The paths are the most "straight-forward" one indeed, but also the most complex to get right due to aliasing rules and the fact each platform resolves paths differently. This would make writing correct policies error-prone and challenging..
I think the path should definitely be passed to the policy engine (more likely a closure expect if you have another idea? a CEL interpreter? https://github.com/google/cel-spec) but we may also consider passing Metadata
which would allow the users to use Unixes' inodes or Windows Volume Serial + File Index to identify the files they want to protect.
EDIT: If I am correct, cap-std
calls stat
on both paths during a rename.
Even inode numbers come with some subtle concerns. Something as innocuous as editing a config file in vim changes the config file's inode number. And on Windows there are caveats like "In some cases, the file ID for a file can change over time." (source)
It turns out that cap-std does not call stat
on either path in a rename.
Dan Gohman said:
It turns out that cap-std does not call
stat
on either path in a rename.
Sorry, I should have linked source as well: https://github.com/bytecodealliance/cap-std/blob/main/cap-primitives/src/fs/rename.rs#L30
It seemed to me that it does stat but I may misunderstand the linked code.
Dan Gohman said:
Even inode numbers come with some subtle concerns. Something as innocuous as editing a config file in vim changes the config file's inode number. And on Windows there are caveats like "In some cases, the file ID for a file can change over time." (source)
So the only real "secure" way would be to resolve the file you want to protect, then resolve the path requested by the guest and compare their identifier to see if they resolve to the same. You couldn't just store the identifier at the start of the process because we have no guarantee they stay stable in time? :tear:
For example, for wow
, it is paramount to protect ./wow.kdl
, so each hook invocation should resolve ./wow.kdl
, then resolve the path requested by the guest (e.g. wow.kdl.symlink
which point to the same inode than ./wow.kdl
) and compare their identifier to deliver a decision (allow/deny).
At the end of the day, that just means it is the responsibility of the embedder to write correct policies.. but platform behaviour makes that kind of complicated :sweat_smile:
At the end of the day, we could also check whether we can write abstractions to help the embedders write correct policies, but trying to write platform agnostic helpers to deal with file authentication may leads us to learn.microsoft.com, and so, to insanity. :upside_down:
Can you move wow.kdl
out of the directory that should be writable by the wasm module? For example to it's parent directory/move all writable files to a subdirectory.
the idea is that the wow.kdl
defines the workspace in much the same way as a .python-version
(pyenv), package.json
, etc. do and makes tools available in that same directory so that you can clone it, install, and start using them right there. I don't think it's realistic to ask people to add a layer of nesting to their repos for this.
Is the set of files in the root of the workspace expected to stay constant for the duration of the wasi program or is the wasi program expected to add/remove files from the workspace? If the former you could enumerate all files and directories at the root at startup and individually give the wasi program permission to access them all with the exception of wow.kdl.
It is expected to add files (e.g. a compiler) and theoretically move/remove files (though I don't have examples in mind).
Would a compiler place them in a subdirectory? If so giving full access to this subdirectory should be fine.
and just to make it explicit in case anyone thinks these concerns are too pessimistic/picky: there have been - and continue to be - many vulnerabilities due to filesystem races and file identity confusion
I would want it to be possible for someone to run my-compiler -i foo.my -o foo.wasm
if they wanted to.
That said, I could see doing something clever like giving them a different directory (temp OS dir or virtual) that contains all the files that were in the workspace (via symlinks or references of some kind) except for wow.kdl
. Then at the end of the execution copy over any new files into the workspace.
Alternatively, we could go all in on separate source and target areas (though it makes things like formatters impossible) or focus wow
on completely virtual workspaces where we can make the file-system behave nicely and be completely portable (maybe with a way to sync things between the real and virtual file-systems?).
In the virtual option, maybe you'd have to drop into a virtual shell and when you do it copies the state of the workspace into a virtual file system. Then you run your wow commands against this and somewhere (maybe as you go maybe at the end?) you apply changes back to the real file-system/repo.
It seems like it should be possible to have a "hybrid virtual filesystem" where directories (entries) are virtualized but files aren't. I think that would serve a lot of purposes like these.
It would certainly be handy for some code I maintain where I currently copy (or hardlink, where possible) files out of a content-addressed dir into a tempdir just to reconstruct the file tree for wasi
I wonder if that could even be done lazily as guests attempt to access paths
I have been watching this issue: https://github.com/bytecodealliance/cap-std/issues/352
I think it would serve our purpose to be able to create a Dir
that is not a direct handle to a physical directory but to a virtual mapping defined programatically by the embedder. This would allow us to map the root workspace dir 1:1 with the exception that wow.kdl
wouldn't be resolvable.
Hi all! Week-end is there finally :relieved:
@Dan Gohman I've seen on the linked issue that you have some concerns about implementing such virtual Dir
in cap_std
. Which is understandable because it would break the current API. If we were to create a new crate for that, do you have an idea how much work that would represent on wasmtime-wasi
-crate?
@raskyld (the stat calls here are conditional with #[cfg(racy_asserts)]
, which is an option used in testing and fuzzing but not enabled in normal builds)
It sounded like there might be a way to use https://learn.microsoft.com/en-us/windows/win32/devnotes/ntopendirectoryobject
to obtain a handle representing a unified root containing all the Windows drive letters. If that's what you want, and if that turns out to be feasible, then it should be straightforward to implement within cap-std and have everything else "just work".
I was quoting the issue but what I was thinking about was more general: In cap-std, a Dir
directly map to a std file.
What about a feature allowing wasmtime
user to build a virtual dir to back WASI preopens
that doesn't map to a file directly. It would be a virtual tree where nodes are mapping to either another virtual tree or a real host's fd/handle.
In the case of wow
, we would just need to map the virtual dir to the real host's dir with a filter that avoid mapping the wow.kdl
at all. I don't have all the implementation details, but I am looking for a path to start trying to find a solution to the problem.
Hey all, small :up: , what do you think about this approach?
I'm not sure. We've tended to avoid having elaborate configurable filesystem APIs at the Wasmtime host API level, because a lot of use cases should be better off implemented by virtualization via guest code. But maybe some of these hybrid schemes point to a use case that can be more effectively done in host code, so perhaps we should consider it.
Last updated: Nov 22 2024 at 16:03 UTC