in the spirit of starting a thread on this, I've noticed notification emails being received very late today, I'm only just now getting notifications in email for stuff that happened hours ago
Continuing in that spirit GH actions and GH as a whole has been slow/buggy this week (and a little of last week)? My emails have been also coming through slowly, but haven't seen actions failures as a symptom just yet
pages are loading slowly and getting intermittent errors for me rn
https://www.githubstatus.com/ says "notifications are delayed" but it seems like its their whole system that is being sluggish
aaand now I'm getting unicorns
git pushes are failing now too
man things fell over fast
Screenshot 2026-02-09 at 10.46.38.jpg
it's like it's christmas!
so colorful!
when I do manage to load a PR or something, it seems like actions are not getting scheduled for the PR at all
going through a VPN with a European exit node might help? https://eu.githubstatus.com/
there was a snow hour over here, too
my US team seems like it's working again?
Till Schneidereit said:
going through a VPN with a European exit node might help? https://eu.githubstatus.com/
routing through germany I'm getting extremely slow git pushes right now as well as unicorns -- my guess is this is more of a backend thing than a frontend
everything got green for a bit and it's all back to very red
We're not really keeping track per-se, but at some point we're going to cross the threshold of "it would be cheaper to hire someone to maintain self-hosted CI infrastructure"
Three Mondays in a row with major outages; perhaps all BA member companies should adopt 4-day workweeks Tue-Fri ¯\_(ツ)_/¯
Till Schneidereit said:
going through a VPN with a European exit node might help? https://eu.githubstatus.com/
Wouldn't that only help if the bytecodealliance enterprise account itself is a European account?
I'm fairly sure there was an underlying resource failure of some sort; it absolutely went out here in the EU as well, but was back up in about 15 minutes or so......
FWIW, all of gh is on a stability freeze -- no new rollouts or config changes of any sort -- to fully understand and rectify what happened particularly this week.
Screenshot 2026-02-10 at 09.34.58.jpg
so it begins anew...
good god
are you running again, or is it STILL there?
I have done much GitHub myself this morning and the page is all green now so hopefully fine...
Screenshot 2026-02-11 at 10.17.07.jpg
Another day, more errors. I'm seeing a lot of delayed notifications this morning as well as a lot of spurious failures in this CI run
all I can do here is listen to the pain and pass it along to Ben
Oh that's understandable yeah, this is primarily a heads-up channel for us so we can share what we're seeing and be aware of outages/problems on our end
totes git it; I'm just letting you know that I'm backchanneling but also that I can't do more than that
that's also much appreciated too!
Maybe we can get the CEO of GraphQL on the phone
(apologies, couldn't resist)
hey, any port in a storm, right?
as it happens, the ex CEO of GH is starting his own new GH, so maybe we can all move there while they don't charge anything? :-)
meanwhile, the poor pm who has to deal with all this from customers:
image.png
due diligence: he IS kidding, painfully
We've talked about retries and such before, but here's an example of an exponential backoff and it just fails every time...
Could be hitting the rate limit for unauthenticated requests...
Could try using the gh CLI which can download via authenticated API calls, e.g. for the example you linked this seems to work: gh release download --repo bytecodealliance/wasm-tools wasm-tools-1.0.27 -p wasm-tools-1.0.27-x86_64-linux.tar.gz
I believe gh is preinstalled for standard actions runners but it might require a bit more config to make it authenticate as the action: https://docs.github.com/en/actions/tutorials/authenticate-with-github_token#example-1-passing-the-github_token-as-an-input
Alternatively: https://github.com/marketplace/actions/release-downloader
It looks like each attempt there downloads ~55kB then stalls -- I'd expect a rate limit to immediately return a 429 or 500 or whatever. Looks like maybe a CDN/cache problem as each download stalls at the same chunk? In any case, points more to "flaky platform" than "problem that we can solve easily" IMHO
we could also try to cache tool downloads, so we presumably at least are closer to the storage the bits come from, and they all come from the same storage?
https://www.githubstatus.com/ is green but I'm getting intermittent unicorns rn
If you see
---- cli_tests::test_programs::p3_cli_serve_hello_world_many_no_concurrent_reuse stdout ----
failed to wait for child or read stdio: child failed Output { status: ExitStatus(ExitStatus(1)), stdout: "", stderr: "\nthread 'tokio-rt-worker' (8740) panicked at C:\\Users\\runneradmin\\.cargo\\registry\\src\\index.crates.io-1949cf8c6b5b557f\\tokio-1.51.1\\src\\sync\\mpsc\\list.rs:278:9:\nattempt to subtract with overflow\nnote: run with `RUST_BACKTRACE=1` environment variable to display a backtrace\n" }
Error: failed to read body
Caused by:
0: error reading a body from connection
1: unexpected EOF during chunk size line
in CI logs it's a spurious failure. This is https://github.com/tokio-rs/tokio/issues/8061 and while this has been a bug in Tokio for a long time it seems the tokio update in https://github.com/bytecodealliance/wasmtime/pull/13104 caused scheduling changes such that it happens more frequently now.
CI is broken until https://github.com/bytecodealliance/wasmtime/pull/13150 lands
sorry about that; did I miss something on the add-a-new-crate checklist? annoying that this doesn't surface until a release
last time I dug into this it's actually impossible to prevent this from happening, we're forced to, when adding a new crate, accept that CI will be broken on the next publication
I forget exactly why though, and things have changed where we publish things ahead-of-time now, and we add crates rarely enough I never bothered to re-check
so, no, no mistake on your part and our docs don't mention this, it's just always a fun surprise on the next publish heh
https://www.githubstatus.com/incidents/myrbk7jvvs6p its another day ending in y
Ok this is a first I think -- github seems to have corrupted a merge to the main branch
https://github.com/bytecodealliance/wasmtime/pull/13180 just landed on the tip of tree, and the diff there looks as-expected
However the squashed commit -- https://github.com/bytecodealliance/wasmtime/commit/0c3a69f18df3e6939048b68e9d0dcb5a4d4518f3 -- seems to additionally include a revert of the parent commit -- https://github.com/bytecodealliance/wasmtime/commit/54929c175c1249b8d1978a76c54f92c0317b0181
so github has helpfully reverted a commit for us
I've... never seen data corruption before
how many other PRs have landed and been silently reverted.... I have no idea
that's... extremely odd? race condition wrt base branch maybe? (clearly a GitHub bug)
according to https://www.githubstatus.com/incidents/zsg1lk7w13cf
We have identified a regression in merge queue behavior present when squash merging or rebasing. We have identified the root-cause and are in the process of reverting the change.
the perils of opaque SaaS providers
(I say as an employee of a SaaS provider, speaking with other employees of other SaaS providers)
well, I'll just reland Nick's patch and pray that's the only victim
it's ... kind of insanely lucky that I caught this
I just happened to want to do a small follow-up and couldn't find the code when I was trying to do that
otherwise we never would have noticed this
at the risk of re-igniting the "should we be on GitHub" question, silently losing a commit is kind of the worst sin that a git host could commit
I don't know what to do about that but just want to say it out loud
status page now says:
Update - We have resolved a regression present when using merge queue with either squash merges or rebases. If you use merge queue in this configuration, some pull requests may have been merged incorrectly between 2026-04-23 16:05-20:43 UTC.
I can only hope they realize how utterly serious this is
uptime is almost nothing compared to data loss
whelp it happened again
Let's not land anything else today...
Maybe others did as well, but FYI I got an email from GitHub notifying us of two PRs being dropped instead of merged, so they at least seem to realize that yes, this is quite terrible
I also got an email yeah, I'll mitigate this morning
Yes, I received that email notification from GitHub as well, identifying the PRs impacted and providing follow-up details. Thanks, everyone.
update: yes, they realize it was quite terrible. :-/
AGAIN????
fyi that (i) it seems we missed patching v24 (LTS) from our January CVE (https://github.com/bytecodealliance/wasmtime/issues/13211 just reported); and (ii) I will not do a patch-release for this today, because GitHub Status is red. Another point for "what the fuck, we need a different repository host"
according to https://github.blog/news-insights/company-news/an-update-on-github-availability/ silently reverting commits is not data loss since the previous commits were still in the history
also as news to all it's a day ending in 'y' so there's another github outage today
jesus, what a mess
wish I could help, but I can't
The blog post also describes the merge-queue bug as affecting "merge groups" with more than one PR; that's not us, but we were still affected. Even aside from the PR spin about "no data loss" (sure, if you want to call it data corruption instead, we can), that's concerning from an accurate-postmortem point of view
in all this, I do want to give credit for the fact that we were notified via email within a few hours, and the email included the affected PRs. In combination with the commits still being addressable, that at least meant that even in the extremely unlikely scenario where no other copies would've existed, we could've restored them, and we knew we'd have to pretty quickly
speaking or trying to speak objectively, it sucks and if it were normally the case I'd never put my work there; it hasn't been that bad before, and I don't have insight into what is the issue now (I could ask, but I already know the people I know are underwater as you might imagine trying to stablize things), but hey -- make it work or lose the user is pretty much the name of the game.
there are other things going on of course, including the yearly rate of growth that I'm not at liberty to discuss but that is absolutely insane and which makes my megacorp gasp. But again, none of that matters if they corrupt my stuff, let alone block working each week a bunch of times.
https://mitchellh.com/writing/ghostty-leaving-github pretty much sums up most of the people I know:
Lately, I've been very publicly critical of GitHub. I've been mean about it. I've been angry about it. I've hurt people's feelings. I've been lashing out. Because GitHub is failing me, every single day, and it is personal. It is irrationally personal. I love GitHub more than a person should love a thing, and I'm mad at it. I'm sorry about the hurt feelings to the people working on it.
I've felt this way for a long time, but for the past month I've kept a journal where I put an "X" next to every date where a GitHub outage has negatively impacted my ability to work2. Almost every day has an X. On the day I am writing this post, I've been unable to do any PR review for ~2 hours because there is a GitHub Actions outage3. This is no longer a place for serious work if it just blocks you out for hours per day, every day.
It's not a fun place for me to be anymore. I want to be there but it doesn't want me to be there. I want to get work done and it doesn't want me to get work done. I want to ship software and it doesn't want me to ship software.
lotsa fun
From the MS people in runtime, it seems the amount of work github is having to do has increased significantly from AI
https://github.blog/news-insights/company-news/an-update-on-github-availability/
OH YES
another data point: over the past two years, roughly 90% of the data in the entire world was created by AI
and we can guess how much of that was worthless
so... if you throw in the GH "unversal user" bug they had to fix the past two months and the AI scale up and the human scale up it's a hard job. That said, uptime and consistency are their raison d'etre, as they say......
so.... do they a raison?
Last updated: May 03 2026 at 23:15 UTC