But, at the same time: Cloudflare isn't going to serve me a cache from Seattle, Manchester, or Tokyo. Pinning down an unknown Signal user to even a rough geographic location is an important bit of metadata that could combine to unmask an individual. Neat attack!
Although. it has edge usecases even for "normal people":
Eg. you suspect your coworker to be catfishing you on eg. discord, you know that he's in your city now, verify, then wait for him to leave for a vacation to somewhere abroad, check again.
CloudFlare get to see a fuckton of metadata from private and group chats, enough to trace who originally sends a piece of media (identifiable from its file size), who reads it, when it is is read, who forwards it and to whom. It really doesn't matter that they can't see an image or video, knowing its size upfront or later (for example in response to a law enforcement request) is enough
Say for example that you're an investigating agent in regular contact with someone.
A single data-point wouldn't mean anything. However, a sequence of daily image retrievals might tell you that they spend 90% of their time in WA and 10% of their time elsewhere.
That information alone still might not mean anything, but if you also have a specific suspect in mind, it may help confirm it. Or if you have access to the suspected person directly, if you're able to also befriend their "clean" profile, you might be able to pull the same trick and correlate the two location profiles.
De-anonymisation isn't about single pieces of information, but all information helps feed into a profile to narrow suspects or confirm suspicions.
( By "agent" I just mean a person, not an AI agent nor Law enforcement, who could presumably just get the information more directly from cloudflare. )
Whether this specific level/type of deanonymization is a problem for your particular use case is an entirely different question. Personally, I wouldn't even care if mutual contacts were to see my IP address outright (and they do for calls), but I'm not every user.
There was a real example of that amount of information being relevant in the Silk Road investigation. Ulbricht accidentally revealed his timezone early on, which was useful to US authorities since it narrowed him down to being in the US, whereas without that information he could have been from anywhere in the world.
We used everything, from browser fingerprinting (and EFF only made the world aware of it 6 years later), looking them up in databases, tracing every digital evidence they left, etc.
Every little thing counted. What I learned is that people leave a lot of traces and you can collect these traces to dox them. The way you write is even sometimes fairly identifiable.
Or send this to a bunch of signal users whom you suspect one of them being a particular person, and if you know that the person you are looking for is going to travel you can send it once before and once after. Then see which of these users were in the home city and subsequently in the destination city.
Does the caching occur even if both users are online when the attachment is sent?
For example, imagine someone suspected of sharing sensitive information with a journalist. They might have a short list of suspects, and use this technique to confirm which one it is. They might identify which journalist it is - maybe only a limited number cover this beat.
This is very very bad.
Not necessarily. Cloudflare is very upfront that they do not cache everything, and the time things are cached can vary greatly.
The kid keeps talking about "deanonymization" and he has no idea what the term actually means.
Agree, good writeup, but also a stretch to say they are "pinpointing" anyone's location.
Surprised signal doesn't have this option.
I only message people I know on Signal anyway.
Edit: it seems signal does have the option
The only case where it might be downloaded more than once is if the user has multiple clients. Not that common and still very little traffic.
There's definitely cases where this is going to be immediately used. Shit, just using it to scrape Cloudflare for additional metadata on everyone from other user table leaks is probably valuable data. Even triangulation over time as they move around is going to get a more precise result. Maybe you find a vulnerability that takes that cloudflare node offline and run it again, repeat until you've got a fairly small radius they could be in.
Yeah and in that case there won't be a data center because who puts one in places without clients nearby? :)