Like Calvin's topic, but for Python.
Please reply or DM me if you're interested in meeting and collaborating on Python guest component tooling.
If you already expressed interest in my earlier discussion of a Python guest binding generator in the #wit-bindgen stream, I'll assume you're interested in this, as well :)
Myself and @Daniel Macovei are interested in Python guest
We don't currently have a Python Bytecode Alliance project for Componentizing Python. It'd be great for this group to talk about what approaches they're using and how to make an equivalent to Componentize-JS.
Please post your availability here if you're planning to join us: https://www.when2meet.com/?19002194-nc5IB
Definitely interested.
I went ahead and scheduled this for Friday at noon ET / 9am PT. We'll meet at https://meet.jit.si/PythonGuestComponents-2023-03-10
DM me your email address if you'd like an email invitation. See you then!
Reminder: we're meeting at https://meet.jit.si/PythonGuestComponents-2023-03-10 in about 10 minutes. Agenda and notes here: https://hackmd.io/q5SWcHt1TWaYWcMtt-xI9g
See you soon!
I'd like to schedule the next Python Component Tooling meeting for Thursday at 1pm ET (10am PT). Let me know if you'd like to attend but that time doesn't work, in which case I can reschedule. Agenda and notes here: https://hackmd.io/4l5OFAwISZuXl6MtdxOlEA
@Brett Cannon FYI
I can make it, but I will have to move a (not critical) meeting to do so. So we can keep the time if it works for folks, but I won't complain if it moves either. :wink:
Folks who would like to attend: please indicate your availability here: https://www.when2meet.com/?19300681-boaTm. I'll move it if there's another time that works so Brett doesn't have to shuffle his schedule.
Looks like everyone's available at 3pm ET (noon PT) -- let's meet then at https://meet.jit.si/PythonComponentTooling-2023-03-23
:point_up: This is starting in a couple of minutes.
I forgot one part of the numpy / pandas work yesterday that was a pretty big missing piece. Numpy requires setjmp/longjmp and one extension in pandas is C++ and requires exceptions (specifically __cxa_allocate_exception/__cxa_throw). I just stubbed them out for now to get them to compile. I don't know exactly when they get triggered during use though. setjmp/longjmp has showed up in other work as well (Ruby, Lua) so we'll likely need a general solution for them at some point.
BTW, just for the last hour or so before I go on vacation, I thought I'd see what it might take to get SciPy compiled. This looks like a much bigger problem. Numpy could be compiled without the LAPACK/BLAS libraries, but it looks like SciPy requires them. And.... they need a FORTRAN compiler. I haven't found any information about compiling FORTRAN to WASM using anything but Emscripten.
Yep, we were once asked if we wanted to fund work to make a Fortran compiler work under WebAssembly :big_smile:
I'd be curious to know whether Flang or LFortran are mature enough to meet SciPy's needs, and whether they can be made to target wasm32-wasi.
Apparently LFortran can translate Fortran code to C++ code, which is interesting.
Would 3pm ET (noon PT) on Thursday work for the next Python Component Tooling meeting? If so, let's meet then at https://meet.jit.si/PythonGuestComponents-2023-04-06.
@Asen Alexandrov FYI
We'll meet at the above Jitsi room in ~45 minutes. Agenda and notes here: https://hackmd.io/kdXktZ8DQriSAvlm_YOfJw
I've learned a few more things about how dynamic linking might work, and what steps we might take next, from discussion with Luke Wagner, Alex Crichton, and others.
In particular, I didn't understand that components can already contain multiple core wasm modules, with the component defining how to link together their imports and exports. Together with toolchain conventions, possibly matching what's in the existing Emscripten support for dynamic linking, something like wit-component or componentize-py could synthesize the right glue to make dlopen
work.
Suggested next steps are to dig into those Emscripten conventions and also to understand how Pyodide uses Emscripten's conventions.
Shall we meet on Thursday at 3pm ET (noon PT)? If that works for everyone, we'll meet at https://meet.jit.si/PythonGuestComponents-2023-05-04
Also, componentize-py is now feature-complete, i.e. it should be able to handle arbitrary WIT worlds. However, the generated bindings are not particularly ergonomic or idiomatic, so the big remaining TODO is to factor out wasmtime-py
type binding generator and reuse it. I'd also like some feedback on how to convert from Python exceptions to WIT result
s and vice-versa.
One more update: Jamey and I have a pretty solid plan for "dynamic" linking which I think will fit the Python ecosystem's needs very nicely. We should have an RFC up for feedback by Thursday.
By the way, I'm putting this stream on the highlight in the "Future work" of an article we'll publish today or tomorrow at WasmLabs. Let me know if you find this inappropriate and I will remove the reference - https://se2-bindings.wasm-labs.pages.dev/articles/wasm-host-to-python/#future-work
why would this be "inappropriate"? It's a good post about how you do this right now.
Ralph said:
why would this be "inappropriate"? It's a good post about how you do this right now.
By "inappropriate" I meant pointing people to this Zulip stream for further digging. I personally cannot think of a reason against this, but I prefer to ask before I put someone in the lime light.
Yeah no worries, this is a public Zulip instance and the intent is to have lots of folks take a look and discuss here, so no need to avoid linking it!
We'll be meeting in a few minutes at https://meet.jit.si/PythonGuestComponents-2023-05-04. Agenda and notes here: https://hackmd.io/vJeeNh1KSvq1449O5OqL7w
Oops, sorry, not in a few minutes -- I had it on my calendar wrong. It's two hours from now: noon PT / 3pm ET
Could you please grand permission to the hackmd doc?
Done; thanks for the reminder.
Joel Dice said:
We'll be meeting in a few minutes at https://meet.jit.si/PythonGuestComponents-2023-05-04. Agenda and notes here: https://hackmd.io/vJeeNh1KSvq1449O5OqL7w
I guess you shouldn't feel too bad about this, I missed the meeting because I had it in my calendar as 3pm Central...
Sorry for the confusion. I'm going to start using https://zulip.com/help/format-your-message-using-markdown#global-times from now on.
@Joel Dice , @Brett Cannon , @Jamey Sharp I missed the meeding, but I can give you an empirical answer to this
does wasi-sdk expose a C preprocessor symbol or something with the SDK version so we can extract it while building cpython/wheels?
It does not. Was looking for this a month ago, but had to end up relying on the build script that sets up the SDK to also provide its version as a define/env_var where it was later needed for packaging. I only found the __clang_version__
among the defines, when building with wasi-sdk.
Question for the Python experts: What's the most idiomatic way to generate Python bindings for the following WIT world?
world foo {
import foo: interface {
variant error {
oops,
oh-no(string),
yikes
}
bar: func(n: u32) -> result<u32, error>
}
}
I'm thinking something like what wasmtime-py
currently generates, extended to make error
usable as an exception which may be raised:
@dataclass
class ErrorOops:
pass
@dataclass
class ErrorOhNo:
value: str
@dataclass
class ErrorYikes:
pass
@dataclass
class Error(Exception):
value: Union[ErrorOops, ErrorOhNo, ErrorYikes]
# May raise `Error`
def bar(n: int) -> int: ...
The drawback of doing this is that Error
doesn't appear in the type signature of bar
due https://peps.python.org/pep-0484/#exceptions, whereas it _would_ if we didn't try to "exception-ify" the result
. I assume the above is more idiomatic than def bar(n: int) -> Result[int, Error]
, though.
Not a python expert, and not sure I grasp the quesiton 100%. But I typically use DocStrings to convey to users what kind of Exceptions a function could throw vs the statically typed throws x
that Java provides. If I'm not mistaken the base exception class already accepts a string so ErrorOhNo
would have that by default.
I structure my exceptions similar to the way airflow codebase does:
class Error(Exception):
"""Error Exception Type Binding"""
pass
class ErrorOops(Error):
pass
class ErrorOhNo(Error):
pass
class ErrorYikes:
pass
def bar(n: int) -> int:
"""Function binding that may throw Exceptions [Error, ErrorOops, ErrorOhNo, ErrorYikes]""""
...
Then the type hints on bar would show the docstring.
Airflow Exception Definition File
I do believe python users would rather do:
try:
bar(123)
except ErrorOops:
pass
than
result, error = bar(123)
if isinstance(error, ErrorOops):
exception case...
Shannon Duncan (shadowcodex) said:
class Error(Exception): """Error Exception Type Binding""" pass class ErrorOops(Error): pass class ErrorOhNo(Error): pass class ErrorYikes(Error): pass def bar(n: int) -> int: """Function binding that may throw Exceptions [Error, ErrorOops, ErrorOhNo, ErrorYikes]"""" ...
This feels right to me. I don't think making the exceptions as dataclasses would be required. Python exceptions take any number of arguments which are accessible through the args
attribute already. Of course, there is no one-true documentation format for documenting the exceptions (I personally like the numpy docstring style, but there are others).
The reason I was using @dataclass
is that wasmtime-py
currently uses it when generating Python types from WIT variant
s, record
s, etc. and I'm trying to be consistent with that approach. Ideally we'd have some static type checking to verify that ErrorOhNo
has a payload but ErrorOops
does not, for example, and your IDE could warn you if you mixed them up.
Likewise, wasmtime-py
uses Union
when generating code for WIT variant
s and I'm trying to be consistent with that. Could be that wasmtime-py
's generation could use improvement, though, so I'm certainly open to that. I have basically zero Python experience, so I'll defer to just about anybody on this :)
I'm hoping to dig into wasmtime-py
this weekend. I'll try and keep an eye out for that and see if there are some improvements we can recommend. My plan is over the next few weeks to ramp up and start contributing to wasmtime-py
.
variant
s and record
s are more formal data structures, so to me it makes sense for them to be dataclasses. Exceptions in Python are typically less formal. It seems like in most cases, they just get a message passed to them. I do wonder if making an Exception a dataclass would break any of the existing Exception class behaviors. I don't know off-hand. We may have to defer to @Brett Cannon on that one.
Yeah, where it gets interesting is when an variant
is used as the err
case of a result
_and_ elsewhere, e.g. as a parameter to a function or a field in a record
. So it may not _only_ be used as an exception.
I see that wasmtime-py
defines this:
T = TypeVar('T')
@dataclass
class Ok(Generic[T]):
value: T
E = TypeVar('E')
@dataclass
class Err(Generic[E]):
value: E
Result = Union[Ok[T], Err[E]]
Perhaps only Err
needs to extend Exception
, in which case we don't any variant
or its cases to extend it. I.e. you can raise Err(ErrorOhNo("trouble"))
but not raise ErrorOhNo("trouble")
.
I played around a bit with dataclass and Exception. It doesn't appear to have any obvious ill effects. You still get an args
attribute with the values. The repr
value is slightly different because dataclass adds the names of the fields, but I don't see that as a problem. You might want to add frozen=True
to the dataclass call so that the fields can't be written to. If you write to the value
field, the args
attribute no longer matches.
I don't think I have ever seen an exception written as a dataclass. Exceptions can have attributes on them, but for the common case where extra data isn't useful in code itself, the inheritance hierarchy conveys the important information and you provide a human-readable message.
And returning a Result
type isn't done in Python; you raise exceptions as necessary and can document what exceptions you explicitly raise.
I'm also not sure what information you're trying convey with your error
variant. Am I to view each individual variant as a bit of information for a larger error type, or each their own type of error that are grouped together for typing convenience? My brain reads it as the latter, so I would assume it would be something more like:
class Oops(Exception): pass
class OhNo(Exception):
def __init__(self, message, value):
self.value = value # I don't see a name for the parameter to the `oh-no` variant.
super().__init__(message)
class Yikes(Exception): pass
Now you could have an error
base class that they all inherit from:
class Error(Exception): pass
class Oops(Error): pass
class OhNo(Error):
...
class Yikes(Error): pass
Having a common exception class for an overall API that has multiple, custom exceptions is common.
Thanks for the input, everybody. In case it wasn't clear: componentize-py needs to be able to generate Python bindings for arbitrary WIT files, and those WIT files aren't necessarily designed with Python (or any specific programming language) in mind. So when it gets something like this:
world foo {
import foo: interface {
variant error {
oops,
oh-no(string),
yikes
}
struct foo {
x: u32,
what: error
}
bar: func(n: u32) -> result<u32, error>
baz: func(e: error) -> foo
}
}
... it needs to do the best it can. What information is the error
variant trying to convey? Who knows? Imagine somebody else wrote it and we have no idea what they were trying to convey. That's componetize-py
's perspective -- it gets WIT someone else wrote and generates Python bindings for it. So the question is: what's the most idiomatic Python code it can generate from WIT files which may be entirely un-Pythonic (e.g. using variants, records, u32s, or who knows what to represent errors)?
I still feel like @Shannon Duncan (shadowcodex) 's overall exception hierarchy is the correct way, but maybe just add the dataclass features like you originally had to add formal definitions of the payload. As Brett mentioned, I've never seen exceptions as dataclasses before either, but they might be needed to make this work on both sides. While it's a little sketchy, maybe add an args
property to the base exception class as well to return a tuple that contains the data value. That way e.args
and e.value
won't get out of sync.
I'm really interested in solving the "run untrusted Python code in a WASM sandbox inside my Python programs" problem. I wrote up some notes on what I'm looking to solve here: https://gist.github.com/simonw/b9a1f080714785b7ee16c7d04db12210
Short version: I want to be able to say "result = execute_untrusted_python(untrusted_code_string, memory_limit_in_bytes=8196, time_limit_in_seconds=1.0)" and get back the result of executing that code in a safe sandbox, with enforced memory and time limits.
@Kevin Smith @Brett Cannon @Joel Dice
Is the real challenge here that we only know it’s an exception cause the wit says error? But if it says e
or problem
or some other random word we would skip the exception stuff all together?
To me it isn’t obvious yet how from a WIT we could generate any exception classes. We only know this case cause of how the variants are spelled, in future edge case they could label their error variant as X or something.
Does WIT have any formal way of handling error/exceptions/etc?
@Shannon Duncan (shadowcodex) the name is not relevant -- it's the fact that the type is used as the second type argument to result
. I.e. any time we have result<T, E>
, where T
and E
are types, we need to treat E
as an "error" type.
So yes, WIT's formal way of representing failures is using result
, similar to how Rust and ML-style languages do it.
Simon Willison said:
Short version: I want to be able to say "result = execute_untrusted_python(untrusted_code_string, memory_limit_in_bytes=8196, time_limit_in_seconds=1.0)" and get back the result of executing that code in a safe sandbox, with enforced memory and time limits.
You can do this to some extent now, but there are limitations. When you are submitting code to the WASM Python instance, you are running a completely separate Python instance than the original interpreter (including a completely separate standard library and installed packages). Using a package like wasmtime-py will allow you to run a python.wasm file inside your Python interpreter and it will be completely sandboxed (although you can allow file system access to specific directories if you wish). You will need to write one export function to execute the submitted Python code and return the result. There is an example very similar to this in udf_impl.c
at https://github.com/singlestore-labs/python-wasi/tree/main/udf. I'd have to double-check the wasmtime-py API, but I'm pretty sure you can set memory limits. Timeouts would likely have to be done using async or threads in your application.
@Shannon Duncan (shadowcodex) That is a good point. It may be that wasmtime-py's way of doing this is best we can do.
Completely separate Python instance is exactly what I'm after - I want it to have access to the Python standard library, but I don't need it to have access to any of my other code other than what I pass into it
The problem I've been having with this is that I don't know very much C at all, so I've been hoping to stumble across an example that does exactly what I'm looking for - I'm confident I'm far from the only person who wants to solve this problem, "python in a sandbox" is a thing that's been wanted by the wider Python community for decades
@Simon Willison The UDF example I pointed to is pretty much what you want, but it does take some work to put the pieces together. Although, if you build python.wasm in that parent project, then run build.sh
in the udf directory, you're pretty close to having it.
It's frustrating because I'm 100% this is possible using existing Python WASM runtimes and the python.wasm build from https://github.com/vmware-labs/webassembly-language-runtimes/releases/tag/python%2F3.11.3%2B20230428-7d1b259 - but actually figuring out how to do it has mostly defeated me, bare this example here which uses a tmp filesystem in a way I'd rather avoid: https://til.simonwillison.net/webassembly/python-in-a-wasm-sandbox
@Simon Willison wasmtime-py
+ componentize-py
should do what you need and not require writing any C or Rust code. You would need to write a bit of WIT to represent the interface the host uses to talk to the guest running in the sandbox, but otherwise it would be pure Python on both sides.
This https://github.com/dicej/componentize-py ? interesting, hadn't seen that one
Yes, it's quite new and still under development, but it works.
My next goal is to publish artifacts to pypi so you can pip install
it.
I have a strong hunch that there is massive, pent-up demand for an easy way to safely run untrusted Python and JavaScript code using wasmtime-py / wasmer-python / etc, and the first project to release a "pip install" package that can do this (and hide all of the WASM / WIT / etc details) will find themselves with a massively popular project
The tricky bit would be hiding the WIT details. We'd need some way to generate WIT from Python code, I guess.
(which can't be done in the general case, but could be done for a subset of cases)
it's frustrating because it feels like this should be one of the most obvious and useful applications of WASM, but it's way too hard to figure out how to do it right now
I would hope I don't need to learn WIT - I only want one function exposed to me, "run_python_code_in_sandbox_and_return_stringified_result(untrusted_string_of_python_code)" - basically I want a safe eval() alternative
Joel Dice said:
Shannon Duncan (shadowcodex) the name is not relevant -- it's the fact that the type is used as the second type argument to
result
. I.e. any time we haveresult<T, E>
, whereT
andE
are types, we need to treatE
as an "error" type.
Thanks Joel, I'm learning :smile: more and more! If that's the case I believe E should be of type Exception
. Saw some discussion on some forum somewhere about adding dataclass attribute to Exception but I think that effects the __str__
dundermethod.
@Simon Willison I agree it sounds great. I think the main thing missing is a sort of reverse binding generator which, instead of generating Python from WIT, generates WIT from (a subset of) Python. Not a trivial project, but doable.
Joel Dice said:
The tricky bit would be hiding the WIT details. We'd need some way to generate WIT from Python code, I guess.
Wonder if function decorators could help solve this.
I think I want something much simpler than that - literally a version of eval() that I can call where the arbitrary code I pass to it is evaluated in a WASM sandbox
Right; I mean that the Python->WIT thing would happen under the hood and not be exposed to the app developer.
That would solve the problem I have today - I'd be happy to adopt some brilliant future solution that lets me use function decorators and generates WIT and suchlike, but honestly I just want to run eval("3 * 5") and get back 15 safe in the knowledge that untrusted code can't break my application or takeover my computer
oh, actually I see what you're saying now -- we just want to pass a string of Python code to the sandboxed interpreter and have it eval'd there. Yeah,, that wouldn't need any WIT stuff.
So we could use componentize-py
today to generate a component with a simple, general-purpose interface, e.g. func eval(code: string) -> result<string, string>
. You'd need to deserialize (unpickle?) the result according to the expected Python type, I guess.
Yup, that would solve my problem perfectly - I'm completely fine rolling my own serialization/deserialization stuff on top of that
I'd probably use JSON for that to avoid any security concerns involving pickle
If I have some time this week, I'll put together a proof-of-concept for this and report back. I haven't actually used wasmtime-py
yet, so this is a good excuse to try it out.
that would be amazing! Can't wait to see what you come up with
Another potential stepping stone on the way to compenetize-py as the sandboxing mechanism for running python could be using a runtime like Deno with pyodide. I submitted a trivial patch to pyodide so that you can run pyodide in Deno via the npm compatibility layer. That way you can use Deno to sandbox the io and WASM to sandbox the python runtime. Still needs some docs but added some examples to the pyodide issue on Deno support.
// example.ts
import pyodideModule from "npm:pyodide/pyodide.js";
const { loadPyodide } = pyodideModule;
const pyodide = await loadPyodide();
const result = await pyodide.runPythonAsync(`
3+4
`);
console.log("result:", result.toString());
Yeah that's exactly what I want to be able to do - I'd love to be able to do that in Python, not just in JavaScript
Although I realize that the catch with Pyodide is that it doesn't provide an easy way to restrict memory usage - I guess because that's protection that browsers already provide. For server-side code I want the ability to restrict to a specific number of MBs of available memory for the untrusted code to operate in
Simon Willison said:
Although I realize that the catch with Pyodide is that it doesn't provide an easy way to restrict memory usage - I guess because that's protection that browsers already provide. For server-side code I want the ability to restrict to a specific number of MBs of available memory for the untrusted code to operate in
Yeah that has to be provided by the runtime. Browsers vs wasmtime.
Wrote up an experiment I did running Pyodide inside Deno inside a Python subprocess: https://til.simonwillison.net/deno/pyodide-sandbox
Shannon Duncan (shadowcodex) said:
Joel Dice said:
Shannon Duncan (shadowcodex) the name is not relevant -- it's the fact that the type is used as the second type argument to
result
. I.e. any time we haveresult<T, E>
, whereT
andE
are types, we need to treatE
as an "error" type.Thanks Joel, I'm learning :smile: more and more! If that's the case I believe E should be of type
Exception
. Saw some discussion on some forum somewhere about adding dataclass attribute to Exception but I think that effects the__str__
dundermethod.
You must inherit from Exception
if you are going to raise an exception. And as I said, I have never seen an exception class be a dataclass, so you're in uncharted territory in terms of compatibility.
Kevin Smith said:
I still feel like Shannon Duncan (shadowcodex) 's overall exception hierarchy is the correct way, but maybe just add the dataclass features like you originally had to add formal definitions of the payload. As Brett mentioned, I've never seen exceptions as dataclasses before either, but they might be needed to make this work on both sides. While it's a little sketchy, maybe add an
args
property to the base exception class as well to return a tuple that contains the data value. That waye.args
ande.value
won't get out of sync.
Every Python exception already has an args
attribute thanks to Exception
:
``python
try:
... raise RuntimeError("I have an args")
... except RuntimeError as exc:
... print(dir(exc))
... print(exc.args)
...
['__cause__', '__class__', '__context__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__suppress_context__', '__traceback__', 'add_note', 'args', 'with_traceback']
('I have an args',)
And you can pass an arbitrary number of arguments:
```python
>>> try:
... raise RuntimeError("I have an args")
... except RuntimeError as exc:
... print(dir(exc))
... print(exc.args)
...
['__cause__', '__class__', '__context__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__suppress_context__', '__traceback__', 'add_note', 'args', 'with_traceback']
('I have an args',)
>>> try:
... raise RuntimeError("I have an args", "so many args")
... except RuntimeError as exc:
... print(dir(exc))
... print(exc.args)
...
['__cause__', '__class__', '__context__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__suppress_context__', '__traceback__', 'add_note', 'args', 'with_traceback']
('I have an args', 'so many args')
Here's what I ended up doing in componentize-py
(happy to change it if there's a better option that works in all cases):
I started with how wasmtime-py
currently represents result
:
T = TypeVar('T')
@dataclass
class Ok(Generic[T]):
value: T
E = TypeVar('E')
@dataclass
class Err(Generic[E]):
value: E
Result = Union[Ok[T], Err[E]]
And made the smallest change that could possibly work, which is to make Err
extend Exception
:
T = TypeVar('T')
@dataclass
class Ok(Generic[T]):
value: T
E = TypeVar('E')
@dataclass
class Err(Generic[E], Exception):
value: E
Result = Union[Ok[T], Err[E]]
That means the E
type need not extend Exception
. So values of type E
can't, in general, be raise
d, but values of type Err[E]
can. For exports, componentize-py
will catch Err
s and turn them into result
s to pass back to the host. For imports, if the host returns an error, it will be wrapped in an Err
and raise
d.
@Brett Cannon (or anyone else who knows): Is it possible to build CPython 3.11.x for WASI on Windows without resorting to WSL? I was up until 1AM last night trying everything I could think of (MSYS2, Cygwin, various flavors of Visual Studio) but never got it working. Building the bootstrap python.exe
was trouble-free using build.bat
, and I was able to use the configure
script to configure for wasm32-unknown-wasi
, but was never able to make
without either compiler or missing file errors.
For context, I'm setting up CI for componentize-py
. Worst case, I can just build using Linux, publish the result, and use it on Windows.
A message was moved here from #wasm > Interpreted Language Guests by Joel Dice.
@Joel Dice I have never tried to do a cross-compile under Windows, so I have no clue what would be involved (I have always done it via Linux in CI or WSL; see https://github.com/brettcannon/cpython-wasi-build for how I have currently automated it)
Ok, no worries. If WSL is required on Windows, that's fine -- just wanted to make sure I wasn't missing anything.
people build things on windows?
that's news to me
In case anyone wants to try componentize-py
out, there are now pre-built binaries available: https://github.com/dicej/componentize-py/releases/tag/canary
$ curl -Ls https://github.com/dicej/componentize-py/releases/download/canary/componentize-py-canary-macos-aarch64.tar.gz|tar xz
$ ./componentize-py --help
A utility to convert Python apps into Wasm components
Usage: componentize-py [OPTIONS] <COMMAND>
Commands:
componentize Generate a component from the specified Python app and its dependencies
bindings Generate Python bindings for the world and write them to the specified directory
help Print this message or the help of the given subcommand(s)
Options:
-d, --wit-path <WIT_PATH> File or directory containing WIT document(s) [default: wit]
-w, --world <WORLD> Name of world to target (or default world if `None`)
-q, --quiet Disable non-error output
-h, --help Print help
-V, --version Print version
@Joel Dice I will give it a shot, I am trying to find a way to run arbitrary python code from our rust code. I read https://wasmlabs.dev/articles/wasm-host-to-python/ and ended up here. Does componentize need to have the python code when building or it would be possible to load the code at runtime?
Currently it wants the Python code while building, although you can always inject code using eval
at runtime.
BTW, I'm planning to make componentize-py
usable as a Python library, hopefully next week. Then you'll be able to pip install
it and write code that generates and runs (via wasmtime-py
) components on-the-fly.
@Simon Willison I spent some time this morning creating a wasmtime-py
/componentize-py
demo per our earlier conversation: https://github.com/dicej/component-sandbox-demo. Unfortunately, it doesn't actually work yet, since wasmtime
does not yet have a built-in WASI Preview 2 implementation (work in progress: https://github.com/bytecodealliance/wasmtime/issues/6370). There's also a bug where the binding generator sometimes uses Python keywords as identifiers, but that should be easy to fix.
Per @Ryan Levick (rylev) 's suggestion, I'm going to try adding an option to componentize-py
to replace all the WASI imports with trapping stubs and see how far that gets us. Longer-term, we'll want a general-purpose "virutal WASI" component which provides e.g. a virtual, in-memory filesystem, etc. for this kind of application.
I've added a --stub-wasi
option to componentize-py
, and have updated the above demo, which now works.
@Joel Dice and all, I'd love to get your thoughts about benefits to understanding how your work here might integrate with https://github.com/microsoft/vscode-wasm?
open ended question. I haven't had the chance to think deeply about it yet myself, so.... just throwing that out there.
@Ralph I'll confess I don't know much about VSCode, and I can't quite tell what that project is for. Does it support hosting WASI Preview 2 components, or just WASI Preview 1 modules? If the former, then componentize-py
could certainly integrate nicely with it.
it hosts preview 1 at the moment using wasi shims sitting on top of node (vscode's engine). In addition, it brings debugging wire-up directly into the ide oob. Very slick, and if we can wire things up to share, that would be coolio. Of course, it will eventually move to preview 2, but I've asked them to enable the javascript experience as well first.
open ended conversation here, but at some point we should set up a demo and chat/noodle for all the python heads....
those are just quick examples; it's not finished or smooth yet, so the ultimate form of experience is entirely malleable
@Joel Dice don't worry about the VS Code stuff; I work on it and it's why I'm here, so it's being looked after (and to @Ralph : the more important thing is installing Python projects which is a separate concern).
As for name clashes with keywords, FYI the convention in Python is to add a traililng _
to a name to avoid the clash.
@Joel Dice Thanks for componentize-py and the demo repos! Is there a way I could bring in 3rd party libraries that have c-extensions (specifically, numpy) in my component? Specifically, I'm getting "Original error was: No module named 'numpy.core._multiarray_umath'". I'm thinking that it's either (a) not actually possible using the wasmtime/componentize tool chain, or (b) that some part of that chain needs to be built with the correct wheels.
@Pamela McA'Nulty I'm glad you asked, because that's what I'm working on at the moment (details here: https://hackmd.io/IlY4lICRRNy9wQbNLdb2Wg). Unfortunately, it's not possible yet, but I hope to make it possible in the near future.
(left a comment with a question on the hackmd)
How does sound for our next meeting? Let me know if you'd like to attend and that doesn't work for you.
@Joel Dice are y'all doing official calendar invites?
Not yet, but I can start doing that. I'll create one and add you if you DM me your email address.
:wave: what's this meeting about? running python apps in webassembly runtimes? or embedding runtimes in python apps?
It's about running Python apps in WebAssembly runtimes, yes, and specifically constructing components implemented in Python that follow the Component Model (https://github.com/WebAssembly/component-model/).
Planning to meet in about 30 minutes at https://meet.jit.si/PythonComponentTooling
Agenda and notes here: https://hackmd.io/ZXNfJqvFQ0KvaWRImWSnqg
I should have mentioned in the meeting that I was using wasix when building numpy, so that might have paved over some issues that they may want to fix in a more permanent way like Brett did with Python itself. That could add to the number of changes needed for WASI.
I assume there's a meeting today?
Yes, 45 minutes from now at https://meet.jit.si/PythonComponentTooling
Will post an agenda here shortly: https://hackmd.io/HXrhjkMXRI20jU9x46UK2A
@Brett Cannon @Kushal Das I've managed to build a WASI libpython3.11.so
and call into it via the C API from another .so: https://github.com/dicej/component-linking-demo. Now I'm trying to import ujson
, which is somewhat predictably failing, considering only ujson.cpython-311-darwin.so
is in sys.path
. So I'm trying to figure out what the appropriate file name(s) for a WASI build of ujson
might be (ujson.cpython-311-wasi.so
, maybe?). When I add debug logging to trace all file opens and stats, I don't see anything, so it's not clear to me that importlib
is looking for _anything_ on the filesystem. Any advice for debugging?
❯ ./run_wasi.sh -c "import importlib.machinery; print(importlib.machinery.EXTENSION_SUFFIXES)"
[]
Looks like it's completely disabled ATM for extension modules since no one expected it to work. :sweat_smile: I opened https://github.com/python/cpython/issues/105738 to fix it.
@Joel Dice would an experimental build or patch work? I'm realizing I don't know if I can even fix this upstream due to the lack of dlopen()
to even build against. But I can probably give you a patch to apply to Python's source to test this out.
Yes, a patch would be great. I've already forked the cpython repo to make it build with a patched version of wasi-sdk 21. Just trying to get everything working before I start opening upstream PRs.
Python meeting today at at https://meet.jit.si/PythonComponentTooling. Feel free to add to the agenda: https://hackmd.io/DpFFGyoYRtq5UBfv1ZCT8Q
I think you meant ?
Yes :point_up:
FYI, further discussion of Python guest tooling will happen in #SIG-Guest-Languages
Our next Python SIG meeting is on Thursday, July 20. @Joel Dice will be on vacation. Does anyone have any agenda items for that meeting?
If no one has agenda items for tomorrow's meeting, we can cancel this one. Any objections?
@Joel Dice is out again this week. If anyone has anything to discuss, I can host the meeting. I've been out of the Wasm loop for a little while because of other project priorities, so I don't have anything new right now. Just let me know if you have a reason to meet this week.
Only thing I had was I tried to compile MicroPython via WASI but failed due to its use of setjmp.h
. I was going to ask what the status of the exceptions proposal was since it seems that's necessary to fix that for WASI-libc?
Let's shift meeting announcements/scheduling over to a topic in #SIG-Guest-Languages in the future.
@Brett Cannon I put the threads proposal on the main group agenda for next Tuesday. I'll also add on exceptions too.
Last updated: Jan 24 2025 at 00:11 UTC