Hacker News new | past | comments | ask | show | jobs | submit
So if you send a picture to a Signal user, it's retrieved via cloudflare, and cached in a data center near that user; now you can look up the cache status and find the data center used. I'd say "deanonymization" is stretching it, unless the user is in the middle of nowhere (no other users near the data center). But interesting writeup anyway.
"Near a user" is also a big assumption. I'm ~200 miles to ORD and ~500 to IAD, but my ISP's peering & upstream arrangements mean Cloudflare serves my traffic 700 miles from DFW.

But, at the same time: Cloudflare isn't going to serve me a cache from Seattle, Manchester, or Tokyo. Pinning down an unknown Signal user to even a rough geographic location is an important bit of metadata that could combine to unmask an individual. Neat attack!

loading story #42781579
loading story #42784357
loading story #42783159
for "normal people", that's a pain, but with enough resources,...

Although. it has edge usecases even for "normal people":

Eg. you suspect your coworker to be catfishing you on eg. discord, you know that he's in your city now, verify, then wait for him to leave for a vacation to somewhere abroad, check again.

loading story #42783793
loading story #42784700
loading story #42786968
loading story #42782690
loading story #42781726
loading story #42788962
It gets more interesting when you think about the impact on groups. Sending an image to a group is enough for all devices associated with that group to be identifiable from CloudFlare's side, who additionally see a giant chunk of unencrypted traffic from the same client addresses going to other web sites. Given Cloudflare's less-than-straight approach to sales, it is astonishing the words "secure" and "Signal" ever appear in the same sentence.

CloudFlare get to see a fuckton of metadata from private and group chats, enough to trace who originally sends a piece of media (identifiable from its file size), who reads it, when it is is read, who forwards it and to whom. It really doesn't matter that they can't see an image or video, knowing its size upfront or later (for example in response to a law enforcement request) is enough

loading story #42781213
loading story #42782076
loading story #42781107
loading story #42788308
It could be useful for correlation.

Say for example that you're an investigating agent in regular contact with someone.

A single data-point wouldn't mean anything. However, a sequence of daily image retrievals might tell you that they spend 90% of their time in WA and 10% of their time elsewhere.

That information alone still might not mean anything, but if you also have a specific suspect in mind, it may help confirm it. Or if you have access to the suspected person directly, if you're able to also befriend their "clean" profile, you might be able to pull the same trick and correlate the two location profiles.

De-anonymisation isn't about single pieces of information, but all information helps feed into a profile to narrow suspects or confirm suspicions.

( By "agent" I just mean a person, not an AI agent nor Law enforcement, who could presumably just get the information more directly from cloudflare. )

loading story #42781481
loading story #42783921
loading story #42781573
It's not stretching it. The expectation is that Signal does not reveal any observable aspect of your IP address or location when receiving messages on it.

Whether this specific level/type of deanonymization is a problem for your particular use case is an entirely different question. Personally, I wouldn't even care if mutual contacts were to see my IP address outright (and they do for calls), but I'm not every user.

loading story #42784080
loading story #42782834
"Deanonymization" doesn't have to refer to a full exact address. There are people who wish to conceal which country or region they live in, which this cripples.

There was a real example of that amount of information being relevant in the Silk Road investigation. Ulbricht accidentally revealed his timezone early on, which was useful to US authorities since it narrowed him down to being in the US, whereas without that information he could have been from anywhere in the world.

loading story #42785084
When I was ~15 and this was ~2004, some friends and I ran a forum with a lot of users and did some bad things where we would track down repeat banned users and screw with them. (In our defense, they were screwing with us.)

We used everything, from browser fingerprinting (and EFF only made the world aware of it 6 years later), looking them up in databases, tracing every digital evidence they left, etc.

Every little thing counted. What I learned is that people leave a lot of traces and you can collect these traces to dox them. The way you write is even sometimes fairly identifiable.

If I know someone on Signal I can now check if they’ve left the country.

Or send this to a bunch of signal users whom you suspect one of them being a particular person, and if you know that the person you are looking for is going to travel you can send it once before and once after. Then see which of these users were in the home city and subsequently in the destination city.

loading story #42786316
The real attack is that a law enforcement agency can trivially subpoena CloudFlare with the attachment URL they will hand over the IP address of the recipient of the image along with whatever other requests they made through the CDN which can pretty precisely and rapidly de-anonymize you.
Indeed, "incredibly precise estimate of the user's location" feels like an exaggeration. But still, very interesting!
I'd say it'd be useful for very specific use cases. Such as finding out what country Jia Tan, the XZ Utils backdoor attacker, is in.
I wonder if it'd be a good idea for Signal to implement a "simple" mode that would deactivate most features in order to reduce the attack surface for people who really think they are being targeted. Would that be a good idea ?
Caching attachments at a single nice, big, juicy honeypot like CloudFlare is one of the reasons Signal's privacy guarantees don't feel totally solid to me. I get that it's pragmatic, but feel there must be a better way.

Does the caching occur even if both users are online when the attachment is sent?

Even time zone leaks are privacy issues, and the leak we're discussing is more fine grained than time zone.
It only takes 33 bits to identify someone. This reveals a couple of bits.
loading story #42782552
Combined with other information, it may identify someone reliably, just like you can with zip code, age and gender. For example, if you know this person is part of a group with members in several locations, or if you can corroborate someone's movements, etc.

For example, imagine someone suspected of sharing sensitive information with a journalist. They might have a short list of suspects, and use this technique to confirm which one it is. They might identify which journalist it is - maybe only a limited number cover this beat.

loading story #42785339
It's leaking so many bits idk what else you would call it, deanonymization isn't a one shot thing and it's a spectrum not a binary outcome
CloudFlare has the actual IP address that viewed the image. Which means some powerful (or rich enough) actors can get it.

This is very very bad.

loading story #42785623
> cached in a data center near that user

Not necessarily. Cloudflare is very upfront that they do not cache everything, and the time things are cached can vary greatly.

The kid keeps talking about "deanonymization" and he has no idea what the term actually means.

> attacker can use the cache geolocation method to pinpoint the recipient’s location

Agree, good writeup, but also a stretch to say they are "pinpointing" anyone's location.

Send picture to multiple accounts, perhaps on different services, the links that are cached at the same data center can be more confidently believed to be related.
This is not unique to signal. URL strings can contain identifying information regardless of where they are shared or posted. For example, if you send a link that ends with string of characters, these may correspond to a geographic location or browser settings. Blogger urls used to be geolocated, such as .ca for Canadian viewers. it is always safe to strip out unnecessary chacters if you're paranoid.
WhatsApp has an option to disable link previews.

Surprised signal doesn't have this option.

I only message people I know on Signal anyway.

Edit: it seems signal does have the option

loading story #42795334
Why would cloudflare ever operate a data center that only one user at a time is ever near?
Looks like it's possible to hit 2 datacenters due to load-balancing, which would narrow it down a bit more. Suppose you do this repeatedly as the target is moving around, hitting even more datacenters.
You underestimate the value of this piece of information taken at different times. It can be enough to know in which country a person was yesterday or is today.
Why does it need to be cached though?

The only case where it might be downloaded more than once is if the user has multiple clients. Not that common and still very little traffic.

For that reason that's why federated setup such as matrix are better. It is much harder to deanonymiza a set of users on different servers in group chat.
Did you see the GIF? It's able to triangulate.
Mmmm "qualified deanonymization" perhaps?
Imagine sending a friend request to bin Laden's videographer and getting a reply from Pakistan while your entire military is looking for him in Afghanistan?

There's definitely cases where this is going to be immediately used. Shit, just using it to scrape Cloudflare for additional metadata on everyone from other user table leaks is probably valuable data. Even triangulation over time as they move around is going to get a more precise result. Maybe you find a vulnerability that takes that cloudflare node offline and run it again, repeat until you've got a fairly small radius they could be in.

Headline feels like a click bait :)
timing and location can usually prune things down to enough data about a person.
> (no other users near the data center).

Yeah and in that case there won't be a data center because who puts one in places without clients nearby? :)

[dupe]
loading story #42801611