Mastodon stampede

"Federation" now apparently means "DDoS yourself."

Every time I do a new blog post, within a second I have over a thousand simultaneous hits of that URL on my web server from unique IPs. Load goes over 100, and mariadb stops responding.

The server is basically unusable for 30 to 60 seconds until the stampede of Mastodons slows down.

Presumably each of those IPs is an instance, none of which share any caching infrastructure with each other, and this problem is going to scale with my number of followers (followers' instances).

This system is not a good system.


Update: Blocking the Mastodon user agent is a workaround for the DDoS. "(Mastodon|http\.rb)/". The side effect is that people on Mastodon who see links to my posts no longer get link previews, just the URL.

Previously, previously, previously.

Tags: , , , , ,

106 Responses:

  1. Rodger says:
    5

    You may find Kris’ write up interesting. TLDR Mastodon has huge traffic overheads.  

  2. Jer says:
    46

    This is absolutely not a good system, but also, it seems a little weird that every pageview of your blog involves the database...

  3. db48x says:
    9

    It’s a good suggestion. Unless a visitor has a session cookie, then they don’t need an artisanal rendering of the webpage. You can let Apache cache it for a few minutes instead

    • jwz says:
      24

      Pop quiz, how many years have I been running this blog? Do you think that I have not already considered or tried literally every boneheaded suggestion that happens to be the first thing that pops into your head? Really?

      • Krisjohn says:
        1

        This so describes most of my weeks dealing with customers.

      • MattyJ says:
        7

        Did you try unplugging it and plugging it back in? I mean, it's not like you helped birth the WWW or anything.

      • Eric TF Bat says:
        3

        Given how much I hate working with caching as a web dev, I'm going to take your comment here and wrap it up and cherish it, and then take it out and WAVE IT AT PEOPLE the next time they, too, suggest that the laws of physics and computing can be avoided by pressing a switch.  Thank you, and may the gods of the internet have mercy on your server.

    • Rodger says:
      8

      I am not convinced that replacing a capacity problem with one of the two hard problems of computer science is the correct answer.

  4. If running a LAMP stack, you may consider mod_evasive: https://phoenixnap.com/kb/apache-mod-evasive to deal with burst traffic

    • thielges says:

      Thanks.   That seems useful against real DDoS attacks.  But for this problem that just looks like a DDoS but is actually a legit albeit heavy handed response, throwing a 403 back at a legitimate requestor is undesirable.

    • Michael Sternberg says:

      I gather this module’s config provides for blacklisting an IP only on repeated requests for the same resource.

      How would that help against a stampede where a myriad of clients request any specific resource just once each?

  5. Elusis says:

    I'm curious whether anyone has made one of the automated "grep your Twitter follows for Mastodon addresses; import the .csv" processes work.  I've uploaded mine days ago, and I'm still only following, like, five people.

  6. Carlos says:
    1

    The Fediverse design could use improvement.  But in addition to the problem you note with Mastodon in particular, its implementation seems to have absolutely ridiculous server resource requirements for the amount of work it gets done.  I don't know if it's because the thing is implemented in ruby-on-rails and node.js, or if it's just really terribly designed, written, and (un)optimized - but there's no way it's going to have more than a niche set of instance operators if they require these massive systems to handle relatively small amounts of users and traffic.

    C.

  7. jwz says:
    10

    Reporting in from the field: @crschmidt:

    Fun fact: sharing this link on Mastodon caused my server to serve 112,772,802 bytes of data, in 430 requests, over the 60 seconds after I posted it (>7 r/s). Not because humans wanted them, but because of the LinkFetchWorker, which kicks off 1-60 seconds after Mastodon indexes a post (and possibly before it's ever seen by a human).

    Every Mastodon instance fetches and stores their own local copy of my 750kb preview image.

    lol, and because @jwz boosted _this_ post -- which does not include my URL in it! -- I got _another_ stampede... because Mastodon fetches the "context" of the post as well, so all the Mastodon servers with someone following jwz got both this post and the parent post indexed, and those servers all crawled mine as well.

    The server I am posting it on could just... fetch the content. Once. And include it in the federation that it sends to all the servers. It would make posts take an extra 2-5 seconds to federate, but with the benefit that cards would be pre-populated, so they would be _faster_ to the end user!

    • jwz says:
      11
      • Joe Luser says:
        6

        that first thread is spectacular. i'm not sure what other people might consider the highwater mark of absolute idiot moronism going on there.  but for me when the neckbeard shows up to explain how, because there are alternate ways to ddos a website, mastodon need not be concerned about designing in a critical ddos flaw that will get predictably worse with scale, that's the moment.

      • Derpatron9000 says:
        2

        Reading that first thread was so painful, I couldn't bring myself to read the other. Fuck these people.

      • derpatron9000 says:
        2

        Let's create more problems by throwing IPFS into the mix.... great work guys.

        • volkris says:
          3

          I suspect IPFS would crash and burn as a solution, but at least it would be more interesting to watch than a plain old web request overload! ;)

      • 7

        Best comment in the first thread:

        We've seen instances running malicious code modify payloads when federating before.

        Casually remarked upon as if this weren’t a massive problem with the entire fucking concept of your app

      • guildz says:
        1

        The few mastodon issues I've read seem to go that way; "theres a issue with how you are doing things" or "this is a much wanted feature" then a maintainer (*cough* gargron *cough*) responds "well I dont think it should change" then the convo ends. Now that we are getting past the inital explosion, it might be time for instances to start moving to a  mastodon fork which isn't a baby and can be more freely worked on by the community.

      • Dave Taht says:

        A couple notes. 1 - bufferbloat is a thing. Not only is your site getting flooded, but starting up so many connections at the same time is *really hard on your network*. I'd hope you were running fq_codel or cake on your server in the first place, and sqm on your router. However even fq_codel starts falling apart under a workload like this which you can see with tc -s qdisc show. Cake might be better.

        2 - what also really hurts here is the syn flood. Linux's synflood protection is set WAY too high for most people's networks, and if you just start dropping syns, tcp's natural exponential backoff should make it a bit less horrible.

        3 - I'd really love a packet capture of what happens to anyone in your situation - start a tcpdump -s 128 -i your_interface -w somefile.cap - (just need the tcp headers) - do a post - get slammed  - stop the cap. I'd be perversely happier if the network was behaving ok and it was just your cpu going to hell... but...

    • spammir says:

      Oooh, it is a server issue, not DDOS from clients? That makes the potential to weaponize it even greater. Don't even need an account with followers...

  8. 8

    Maybe Mastodon was created by clown services to get more money on network data transfer.

    • David Fetter says:

      Clown Services is the name of my next startup.

      Also, the aforementioned clowns don't do that much advance planning, what with Wall Street and quarterly executive comp and shit. Ideology-driven idiocy suffices to explain what we're seeing here.

  9. J. says:
    11

    Perhaps it's time to bring back the Usenet newsreader warning:

    This program posts news to thousands of machines throughout the entire civilized world. Your message will cost the net hundreds if not thousands of dollars to send everywhere. Please be sure you know what you are doing. Are you absolutely sure that you want to do this?"

  10. tfb says:
    7

    This whole thing is cryptocurrency part two, isn't it?

    Here, I have this radical new decentralised thing which cannot, even in principle, replace the thing it claims to be able to replace and in fact serves no clear purpose at all.  And look, because I am so clever and understand everything so well I will design in many features which ensure it cannot really scale, will attack other systems, and will use a great deal of power to deploy.  Look at me I am so clever.

    More amazing still is that, even as crypto is finally collapsing under the weight of its own vast idiocy, people are rushing to this new equally stupid thing.  Because, somehow, this time, the technical fix to the social problem will work.

    • berend says:
      4

      It's not until it:

      • claims to make money (running ads?) by running a mastodon server
      • tell all your friends to make money running a mastodon server

      Then later:

      • host your mastodon server on our exchange.  We'll keep it safe, we promise

      So far it's only got:

      • decentralized
      • too many resources
      • hype
      • elm says:

        Caching on the Blockchain, and each cache/federated thingy will be explicitly allowed to inject more ads and malware and popups.

  11. volkris says:
    6

    The worst part is that this was all so foreseeable.

    When I pulled up the ActivityPub standard specification and started reading about the inboxes and outboxes, that was my immediate thought, won't this design cause a huge scalability problem?

    But I'm assured that the people behind the standard have a lot of experience, have been working on it for years, while I'm just a guy who pulled up the spec on my lunch break. I guess I should trust in their expertise, that they've thought this through. Right?

    Programmers are still taught the concept of big O analysis... right?

    • thielges says:
      8

      > Programmers are still taught the concept of big O analysis... right?

      I'm surrounded by peers with CS, CE, and EE degrees though few are familiar with O(...) analysis.  Most of the senior architects get it but they're only responsible for a tiny amount of the actual coding.  For the rest I fall back on plain old algebra and hand sketched quadratic curves.  

      Even after simplifying to algebra, I can only reach about half the coders before they write their (invisible by looking at a single function definition) quintuple nested loops.  So we have to let empirical measurement provide the proofs for those who don't believe the algebra.

      As a result we lean heavily on our QA team to write realistic benchmarks to empirically flush out these performance design flaws.  That's very important as our products are designed to handle hundreds of billions of objects and the development teams are accustomed to writing test cases with just a few hundred objects.

      I think that part of the issue is many coders have no idea of how a computer system executes their code.  They're working within an idealized programmer's model that doesn't take into account physical constraints like cache sizes, hit rates, and link speeds.  "My algorithm is so powerful that it needs 2TB of physical memory!"

  12. Mika Raento says:
    2

    Genuine question: do you have a concrete idea about how to create a federated version of twitter/facebook/google+/usenet at current volumes of posts and users that doesn't put a largish load on each participating node?

    I was one of the people to suggest a federated social network at Google around ~2008. While I now realize that was not viable from a business perspective (how would you mediate a truly federated network?) I do think there's a fundamental tradeoff between centralisation and federation in terms of the cost of propagating the data.

    Not claiming you are mourning for usenet but I think the quote is appropriate in terms of design goals. "Those who mourn for 'USENET like it was' should remember
    the original design estimates of maximum traffic volume:
    2 articles/day"
    -Steven Bellovin

    Disclaimer: I again work for Google but the opinions expressed are mine and not my employer's.

    • jwz says:
      10

      If I had concrete ideas about how to solve these problems, I'd have written a protocol spec. I'm not gonna be the "you know what you OUGHTA do" guy here because these are, in fact, hard problems. But they are also part of a class of problems that have already been solved, at scale, in the past. So while it's hard, there's nothing that makes me think that it's impossible or even impractical.

      But the fact that the Mastodon developer community's reaction to "here's a real-world scaling problem happening right now" is victim-blaming and 5 years of delay rather than "we'd better find a solution to that pretty quick" is not a great sign.

      • Rodger says:
        4

        I feel like it’s only a matter of time before people start serving goatse to poorly behaved herd of user agents.

    • elm says:
      1

      Examining it from a business case perspective tells one rather a lot about what the solutions aren't.

      End users don't give a fuck about federation and the money to be made owning 1/20th of a Twitter or 1% of a Facebook is minuscule.

      If we boil the ocean to completely redo http, then I'm sure it's possible to start solving the technical problems and cut out the need to trust intermediaries, but then you still need the entire world to change.

      Or you can trust an intermediary enough to absorb the load elsewhere.

    • David Fetter says:
      5

      I'm nowhere near as smart as you folks, but I'm pretty sure that without aggressive moderation, which I can't picture as implemented without the generic mediation capability you touched on, "socials" turn into fascist shitpiles dominated by the absolutely worst actors. This was already the case before fascists recognized every network as an opportunity to do their thang and created tools, techniques, and resources for creating more of same in order to accelerate the process.

      • tfb says:
        1

        You in fact are at least as smart I think, because anyone ignoring that is an idiot or someone who wants to enable fascism or other equally awful things.  Moderation matters and does not scale very well.

        • jwz says:
          11

          I don't have the answers, obviously, but I suspect that the only workable solution to "how do you do moderation at scale" is going to turn out to be, "don't".

          If you take as your baseline the idea that when I participate in social media, there are a billion users who might conceivably interact with me, and now someone has to filter out the nazis... that's gonna be really hard.

          So maybe don't let a billion users interact with me. Keep the social graph actually social.

          Solving the problem of "my roommate's dipshit uncle is a nazi" is a lot easier than solving the problem of "there is a Moldovan bot farm trying to undermine global democracy".

          • tfb says:

            I think this works for people, except for very famous people who will need to spend a lot of time blocking trolls.  But I don't see any point in worrying about those people: they can hire minions to do that for them.

            But it probably does not solve the problem for organizations running these services.  Twitter is probably in the process of providing a demonstration of thos: my guess is that the EU actually have working teeth, countries in the EU have had actual experience with what happens when you do not deal with nazis, and they are therefore completely fine with saying 'if you are not dealing with the nazis on your platform then you are not in the EU'.  And Musk will have another elaborate pubic squealing tantrum which will at least be entertaining.

            I'm not sure if federation solves that problem, because if there are enough nazi-infested instances around then the only practical answer is not to federate with anyone.

            It is at least possible that there is no workable solution.

      • volkris says:
        2

        FWIW, I think in cases like these it's important to separate out technical issues with technical problems from social issues with social problems. Often enough trying to solve either with the other just makes the mess worse.

        So above we were talking about technical problems that might sink these systems even before any of the moderation issues come up. Mastodon has a back end, technical design that eats more and more computational resources, which is a problem. There may end up being no workable system to moderate.

        Well, I suppose instances shutting the door to new users they can't afford is a type of social solution to the technical problem, but exactly one of those that's not exactly ideal.

        As for your question, though, I always promote social solutions that mirror our normal social interactions, focusing on users having more agency over who they're interacting with instead of having to hope that some [maybe overworked] moderator is going to choose right for them.

        We have technical solutions to the technical problems that brings up. Unfortunately, for whatever reason we'd care to speculate about, they haven't been applied.

    • cmt says:

      Usenet is quite federated - or should I say "it was"? It's beyond peak. Part of the reason for using the past tense on Usenet is that it's standards are stuck in the mid-80's (RFC 1036 from 1987 was obsoleted by RFC 5536/5537 in 2009, at which time it was already too late). By the early 2000's some encodings for beyond-7bit-ASCII were somewhat accepted, but some clients still couldn't process that, and let's not talk multipart or anything beyond text/plain. There's a lesson on federated systems and the lowest common denominator of client softwares in there. (See also: XMPP, where you have about five mutually exclusive standards for about everything and it's up to you to figure out how clients and servers play along with that). Also, moderation, which actually does place a burden on operators. See also IRC, which is totally federated and stuck on 7bit ASCII, as they don't even have a way to signal the client's character set. To close this let me paraphrase Niklas Luhmann who famously wrote (in a letter) that with high complexity you get selection patterns and that you should just wait for stuff to explode.

      • timeless says:

        I haven't looked into it, but the IRC instances I use seem able to send emoji.

        • cmt says:
          1

          That's more of an happy accicent, not by design.The specs (RFC 2812/2813 from April 2000) just have "No specific character set is specified" (section 2.2 (2812), 3.2 (2813)) and "The protocol is based on a set of codes which are composed of eight bits" and "delimiters and keywords are such that protocol is mostly usable from US-ASCII terminal and a telnet connection". There is no signalling of character sets and encodings (extensions have been proposed but never became official) and servers mostly just pass the bytes. If the other side sees the emojis you sent, that's only because they happened to run the same encoding as you did. GB2312 or KOI8 anyone? WTF-7?

    • volkris says:
      1

      To me this is a case where maybe there is no good, practical, workable solution to the technical side, but the space between what they're doing now and the better they could be doing is huge.

      This backend design that embraces cascades of multiplying connections throughout both participants and the larger internet, and even accepts them as preferable to an ever so slightly degraded user experience from what the lead developer just KNOWS all users demand (as per that bug report linked above), well.

      No telling how good such a system could actually be in the real world (or, at least I don't have the expertise to tell), but at the least the system could stop being so bad.

  13. Colin says:
    5

    we experienced the same thing over at cohost and repeatedly had to block the mastodon user agent until the flurry of traffic was over -- the culprit was instances scraping our site in order to generate previews of embedded links.  we ended up having to fix it by creating a specific cache to serve long-lived versions of posts just for mastodon link previews, instead of spending time rerendering them.

  14. roeme says:
    2

    Please ignore this if you don't have the time; I just like to learn from people smarter than me: Naïvely I would attempt to serve a 503 with a randomized Retry-After to Mastodon Agents if the load goes too high.

    I'm sure for some reason that doesn't work otherwise you'd already done it, but why doesn't it work?

    Is it because everybody (or at least the Mastodon devs) interprets the 'ought' in RFC7231 7.1.3 as "nah she'll be right mate" ?

    (Since you wrote outright blocking the UA helps I'm assuming this is not a bandwith problem, since even then one still has ingress bandwith for the initial request? Or will that be handled differently?)

    • jwz says:
      5

      Maybe that would work. I dunno. Trying it would require writing some code and... this is a new problem that I have this week that I didn't have last week, so I did the easiest possible thing to make it go away.

      It's not a bandwidth issue, it's a CPU issue. The reason that simply blacklisting the UA works is that the blacklisting happens very early and quickly in the processing of the request (at the Apache layer); actually generating and serving the page runs a lot more code (PHP and mysql).

      There are already people right now typing "well why don't you just" and to those people: please, just, don't.

      • Rodger says:

        This seems like the sort of thing where someone is going to come up with a WordPress plugin for it and you’ll be able to go from there.

        • roeme says:
          2

          It's way too late if you hit wordpress. Do you even computer?

          Unless you meant writing a plugin to reply to people "please just don't suggest X", in which case you get some internet points.

    • derpatron9000 says:
      4

      queue more victim blaming:

      ....frankly, anyone sophisticated enough to think about editing their robots.txt files manually to block Mastodon link previews is probably already going to have a caching proxy in front of their content anyway

      https://github.com/mastodon/mastodon/issues/21738

      • derpatron9000 says:
        3

        replying to myself because, if anyone is "sophisticated enough" to "thing about editing" a fucking text file manually they're "probably already" running a caching infrastructure. What fucking planet do these people live on?

        • Rodger says:
          1

          They’re still at the “hardcore engineering” stage of thinking that social problems have technical solutions in general (e.g. “if racists are being racist at you, just run your own instance” in the real world we call that a ghetto and if you don’t see why that’s a problem well… you are the problem).

          And for this it’s a general Stallman problem. “I don’t understand this. Therefore you are wrong.”

  15. greg says:
    1

    Why not using CloudFlare caching, it's free?

  16. jwz says:
    7

    @syskill:

    Everyone who replied with "use a CDN," is really saying, "I expect all web sites to be run by skilled and dedicated professionals, who deploy future-proofed technology stacks, so that my social network can be run by amateur hobbyists, and developed by those who fear what the future might bring."

    • greg says:

      CloudFlare caching is DNS level as opposed to a CDN. This is a solved problem, I don't get the purpose of this article. I'm happy to help you set it up.

      • jwz says:
        10
        1. I didn't ask for and don't need your help,
        2. CloudFlare is run by nazis, so fuck all the way off.
      • suzanne says:
        2

        Technically, it's an HTTP proxy with CDN, WAF, and other security and performance features, that is provisioned by making a DNS change. The object caching doesn't occur at the DNS level; it happens when a visitor makes an HTTP request for that (cacheable) object. If DNS is pointing at the proxy, the request is intercepted by their servers, forwarded on to the origin, and the response gets cached at the PoP that originally forwarded that request, so it's available to be served much more quickly for future requests, within the object's validity period.

  17. Una says:
    2

    I dunno if you specifically would find this useful, but to add to the discussion, yesterday (as a fediverse admin that's been complaining about this problem for years) I stood up jort.link to try to reduce this load, at least from fediverse users aware of the power of their following. Basically a "let us do messy simple caching for you on behalf of fediverse instances" thing — everyone else just gets a 301. (No, it's not based on Cloudflare.)

    The amount of people who think this problem doesn't exist or that it's just a "quick fix" is astounding. I can point to numerous examples of things I've accidentally taken offline just by replying to someone on fedi, and I've got "only" 800 followers — I can only imagine the effects of someone with 10000. It sure would be nice if Gargron could swallow his pride and admit this is a problem in need of a fix, and make any attempt to prioritize that.

    • jwz says:

      This sounds like a reasonable workaround for this dumb situation, thank you!

      Buuuut I keep getting 504 Gateway Timeout when I test it using a Mastodon UA...

      • Una says:
        1

        Yeah, I saw that in the request log. The nginx hack is really, really finicky about SSL for reasons I don't fully understand (the original issue) and sometimes just outright doesn't connect (current issue). I'm currently rewriting it as an actual proper server program using a real HTTP client so it'll stop being so buggy. At the very least I've turned off the (very poorly considered) redirect fallback, so it'll just 502 to the thundering herd now rather than be completely useless.

        • Una says:
          2

          I've just deployed that rewrite, and confirmed it now happily serves a cached copy of your site to a fediverse UA, even behind your shortURLs.

          • jwz says:

            Thanks! It was working last night, but now it's getting 400 Bad Request.

            • Una says:
              1

              It seems for specifically the jwz.org/b/yj65 URL, a 400 response has been cached from your server; I cache even error responses to make good on my request shield promise. I checked that cache entry and it's just a classic Apache ErrorDocument double-fault.

              I've just dropped this cache entry. Let me know if you have a better solution — I don't want to just retry the request or not cache it, due to the nature of this. Maybe a limited number of retries would be okay, but if I were to have implemented that without this context I would likely assume 4XX class errors are not transient. Hm.

              Maybe there's more information in logs on your end? The IP is 192.99.194.128, with this UA: Mozilla/5.0 (jort.link shield; +https://jort.link)

              • jwz says:

                Huh. Weird. I do see the 400 error but nothing else in my logs, and when I try to reproduce that with the same URL and UA I don't get the error. It must be that something went wrong on my end but I can't tell what.

    • roeme says:

      That is indeed a good solution.

      What's the plan if jort.link becomes too popular? I can imagine that at some point the resource requirements will become non-negligible. Is it "yo fedi users/admins gib monies plz (i need to eat as well)" ?

      Or is the long-term goal to demonstrate to the mastodon devs "look guys as we've told you countless times, this _is_ a problem, see here?" and hence finally get a solution, rendering it obsolete?

      • Una says:
        3

        I already run major fediverse services under Jortage, and a relatively large Mastodon instance on a pretty powerful dedicated server. I don't expect jort.link to become too big for me to run, and regardless people are already donating to keep the Jortage project afloat — the Storage Pool is the media storage for 61 instances, and its deduplication has reduced the costs of our members by over 60%, making it well worth the money of many instance admins that moved from S3 or worse.

        Fundamentally, jort.link has very low resource requirements. Once a remote file is cached, it's as expensive as serving a static file — there's no databases or anything involved, and the only point of contention (a synchronize across threads) is avoided if the nearline memory cache has the metadata memoized. The primary concern is bandwidth usage (and I've got plenty of bandwidth) but that's the point of the 8M page limit. And even if I do have to constrain it, these requests do not need to be timely; fedi software will happily wait many seconds to receive their pages, so I can prioritize serving actual browsers and throttle fedi software. Additionally, a previous version of this was proven to be able to run behind bunny.net, a cheap global CDN. I dropped that only because it currently is slowing things down, introducing another point of failure/data processing, and generally doesn't make sense at this stage.

        I've been encouraging other fediverse big players to run their own jort.link instances — due to the nature of this, more instances does not equal more load to the origin server. Whichever instance you pick is self-contained and will only request the origin once. This was part of the motivation for the rewrite; it's hard to reproduce my nginx/dnsmasq/bunny.net house of cards, while a small self-contained Java program is very easy to run. (Certainly easier to run than Mastodon itself.)

        pyrex (who, for the record, I only know of from this comment thread and them pinging me on fedi with some questions about jort.link and general design) is talking to masto.host about patching the Mastodon code they run to go through a masto.host-hosted jort.link instance for media retrieval transparently on the backend, which would assist with this problem quite a bit due to the sheer number of instances they run.

        I do hope Gargron will acknowledge and fix this issue, but after this many years I'm not holding my breath. If jort.link does become obsolete, that's a win.

  18. pyrex says:

    Hm -- this won't solve your problem because someone has to actually implement it, but I've been rolling this around in my head for a while. I'm wondering if Masto users could be persuaded to start setting up (shared) caching proxies specifically for link preview information.

    Basically, as a secondary set of APIs, I think Masto servers should be able to report "this is the link preview info for page X," given any arbitrary URL  X. That should be cached. This way, if you trust an instance you're federating with, you can immediately get the link preview info for them. Otherwise, you can ask some large instance that you trust and stampede _them_ instead.

    This way you can still federate with people who you suspect might use bad link previews to defame people not on your instance. This is not something anyone explicitly wants to do, but seems likely to happen by accident if you have a policy of federating with small instances by default  -- which Masto does.

    Instances that don't want to be stampeded can refuse to publish this service. That being said, the thing they're caching is much smaller than whole web pages and they can probably do everything with a single hit from cache.

    Instances who know that they're the caching proxy for a bunch of other instances would ideally use a different user agent that still matches the (.*Mastodon.*) family of regexes: that way, people who intended to block old!Masto also block new!Masto, but people who know the difference can unblock them manually.

    Overall, this seems to me like it would prevent stampedes while also having better performance for literally anyone running a Fedi instance. It requires basically no additional work from webmasters. (but unfortunately, significant extra work from Fedi developers)

    This is still less efficient than just federating link preview info by default. The real solution is to not federate with people who lie about link previews, and to defederate with them if you find out you were wrong -- I just think it's good to limit harm in the case where people are too lazy to do this, because Masto is designed in a way that inherently leads to transitive trust-style problems.

    I've posted this link on Mastodon today (linked) hoping some Masto admins see it and have opinions on the idea.

    • pyrex says:

      Replying to myself to add: two people on Masto inform me this has been built as an unofficial thing: https://jort.link/ .

      The current implementation appears to put a lot of responsibility on webmasters and Masto users though, while putting basically no responsibility on instance owners. So basically, I think the social incentives are all wrong since it puts all the power on the people who aren't harmed and opts server owners in by default, unless they do user agent filtering manually.

      (My opinions on this seem shared.)

    • jwz says:
      6

      This whole notion of "but what if the link previews are full of lieeeees" is complete nonsense. It's an asinine strawman. A red herring.

      If I post a link and instead of the cat video you were hoping for it's actually a rickroll, that's on me, and the block button is right there.  My instance is already attesting to other instances that I am who I say I am. If I post a link that is not what it purports to be, that's my fault.

      If you are accepting that I am not a malicious actor (which presumably you are by allowing me on your timeline) then you must transitively accept that I am not sending you malicious links.

      (Where "malicious" in this case means nothing more tragic than "the thumbnail and title are different".)

      The instance you post to should retrieve the link preview data, once. When it distributes your post to other servers, it should include that data with it, along with all the other metadata about the post, such as who you are.

      Yes, people can and will post shitty links, and deceptive links. And you can and will block those people for being shitty and/or deceptive people.

      You can't code around social problems, but you absolutely can code around not DDoSsing my server under normal, non-adversarial operation.

      Once again, "Everyone who replied with 'use a CDN' is really saying, 'I expect all web sites to be run by skilled and dedicated professionals, who deploy future-proofed technology stacks, so that my social network can be run by amateur hobbyists."

      And importantly, remember that it's not just the person with a lot of followers who is at risk here. If I, with a lot of followers, were to post a link to someone else's small web site, that person is going to have a bad day. Or even if I reply to a thread in which they were already linked by people with fewer followers! As soon as I say even one word in that thread, that also triggers a stampede on their server.

      • rjp says:

        The instance you post to should retrieve the link preview data, once.

        • rjp says:

          Well, that failed horribly. Couldn't figure out how to get out of the quoted text box.

          Anyway, if you include "the instance you boost from" in the "should retrieve the link preview", that also fixes the "people I don't necessarily trust appearing in my timeline via boosts" problem since that moves the trust back from "random" to "someone I follow".

      • pyrex says:

        Yeah, I thought about it (and read more of the comments) and I think you're right. As a server that is receiving a federated post, I should either trust the information I am given about the link preview, or I should not fetch the link preview at all. Any other behavior can DDOS you in at least one case.

        (And this is less bad than a lot of masto-level misbehavior that _can't_ be autocaught.)

        Someone who says "I don't trust people not to lie about the links" is saying "well, between trusting people not to lie about the links, not having link previews at all, and potentially DDOSing anyone whose link is slightly popular, I picked option 3."

        The feature never should have been released in this state and a lot of people who are making this complaint are basically just attempting to justify pushing costs onto you.

      • boonq says:
        3

        “Have you noticed CloudFlare is free” is probably the single most amusing thing to hear from the peanut gallery.

        How do you solve Mastodon’s distributed problem? Well, step one is to concentrate even more of the internet’s traffic in one giant corporation! Who needs a bird-shaped chokepoint on microblogging when we can have a cloud-shaped chokepoint on everything! (Also, have the 40-something percent of decentralized web sites hosted on WordPress considered… not? Something something database calls something something.)

        The CDN bullshit lets Mastodon cosplay as an independent solution to centralization by forcing everyone else to abandon their independence.

  19. Alan Bell says:

    This may be a dumb solution, and I haven't yet tried it, however the blog may respond to that user agent with little more than a <link> in the head referencing another url with type "application/json+oembed". Then that second URL returns a scrap of json which contains the information to build up the preview card. This could be a lighter weight response than the full blog page. I am not sure this is really the right place to fix the problem, or if there is a better way to discover an oembed URL than fetching the whole main post just to get a link to the lightweight card.

    • jwz says:
      1

      Even something as simple as "what is the title of this blog post" requires a dive down into WordPress and mysql, so at that point the damage is done. It's not that these are expensive queries but that there are a fuckton of them in a short period of time, and lest we forget, that they are totally fucking unnecessary.

      • Alan Bell says:
        1

        True enough, this saves some bandwidth but not much else. Will dig a little deeper into the Mastodon code, this looks like a nice feature to fix by doing it properly, needs a serializer for the card, and then the fetch_link_card service needs to wait a few seconds then ask the origin server if it has a serialized link card before attempting to make one itself.

  20. Peter Morris says:

    I front everything of mine with CloudFlare.

  21. Jessie B. says:

    Hi, I have the same issue with my blog. Would be kind to post here the exact rule that you wrote on your .htaccess file to block the Mastodon user agent? I'm not very good with this stuff... thank you! 🙏

    • jwz says:
      1
      RewriteCond %{HTTP_USER_AGENT} (http\.rb/\S+\s\(Mastodon|Pleroma\s|Akkoma\s|Misskey/|gotosocial)
      RewriteCond %{REQUEST_URI}     !^/error/
      RewriteRule .? - [F,END]

  • Previously