Either Mastodon's link preview bot should obey robots.txt or Mastodon needs O(1) link previews: https://www.jefftk.com/p/mastodons-dubious-crawler-exemption
@jefftk Would this still be a issue if Mastodon lazily built its link preview caches only after the first user requests the link?
Is the issue the total number of requests, or the fact that they are happening automatically, whether or not a specific user requests them?
@jefftk I agree that that would satisfy the contract, but it seems a bit rules lawyer-y to suggest that they can solve the problem by changing to a nearly identical action taken in response to user action - unless you think most instances never *display* the link previews that calculate.
From a practical point of view, the problems identified in this thread don't seem like they would be solved by fetching in response to user action: https://better.boston/@crschmidt/109412294646370820
On one hand, _some_ of those problems would be solved, because the requests would be spread over more time (many instances won't have anyone looking at the new post within 60s). TBF, that can be done manually too.
Alas, this rules lawyery interpretation is already used for embedded images in e-mails: fetching them when the recipient is reading the email is considered perfectly fine and ~everyone does that (without any knowledge about any relationship or lack thereof between the owner of the site the image is on and the author of the email). An email with an embedded image sent to a mailing list is essentially equivalent to the situation here.
@jefftk I'm no expert but it seems like the practical problems would be minimized if they used opengraph tags to generate the previews, and were more efficient about how they do those fetches.
They should still respect robots.txt if performing automated actions, but with efficient preview generation I imagine people would mostly not care.