**Jeff Kaufman** @jefftk@mastodon.mit.edu · Nov 27, 2022

**Jeff Kaufman** @jefftk@mastodon.mit.edu · Nov 27, 2022

Jeff Kaufman @jefftk@mastodon.mit.edu

Nov 27, 2022

Jeff Kaufman @jefftk@mastodon.mit.edu

Either Mastodon's link preview bot should obey robots.txt or Mastodon needs O(1) link previews: https://www.jefftk.com/p/mastodons-dubious-crawler-exemption

Mastodon's Dubious Crawler Exemption

When you share a link on social media the platform fetches the page and includes a preview with your post: Even though this happens automatically the system…

www.jefftk.com

**zebrask** @zebrask@schelling.pt · Nov 27, 2022

**zebrask** @zebrask@schelling.pt · Nov 27, 2022

Nov 27, 2022

zebrask @zebrask@schelling.pt

@jefftk Would this still be a issue if Mastodon lazily built its link preview caches only after the first user requests the link?

Is the issue the total number of requests, or the fact that they are happening automatically, whether or not a specific user requests them?

**Jeff Kaufman** @jefftk@mastodon.mit.edu · Nov 27, 2022

**Jeff Kaufman** @jefftk@mastodon.mit.edu · Nov 27, 2022

Nov 27, 2022

Jeff Kaufman @jefftk@mastodon.mit.edu

@zebrask the latter: unless the requests are happening because someone directly asked for them, my interpretation is robots.txt should apply

**zebrask** @zebrask@schelling.pt · Nov 29, 2022

**zebrask** @zebrask@schelling.pt · Nov 29, 2022

Nov 29, 2022

zebrask @zebrask@schelling.pt

@jefftk I agree that that would satisfy the contract, but it seems a bit rules lawyer-y to suggest that they can solve the problem by changing to a nearly identical action taken in response to user action - unless you think most instances never *display* the link previews that calculate.

From a practical point of view, the problems identified in this thread don't seem like they would be solved by fetching in response to user action: https://better.boston/@crschmidt/109412294646370820

Christopher Schmidt (@crschmidt@better.boston)

Fun fact: sharing this link on Mastodon caused my server to serve 112,772,802 bytes of data, in 430 requests, over the 60 seconds after I posted it (>7 r/s). Not…

Better Boston

**zebrask** @zebrask@schelling.pt · 2022-11-29T05:13:55Z

zebrask @zebrask@schelling.pt

@jefftk I'm no expert but it seems like the practical problems would be minimized if they used opengraph tags to generate the previews, and were more efficient about how they do those fetches.

They should still respect robots.txt if performing automated actions, but with efficient preview generation I imagine people would mostly not care.

November 29, 2022 at 5:13 AM · · Tusky · · ·

Trending now

Resources

Developers

What is Mastodon?

schelling.pt

More…