Story Detail of id 48482218 | Liveview Hacker News

adzm22 hours ago | on: πFS

It is worth noting that as the length of data increases it becomes extremely unlikely that the index and length of the sequence within pi would actually be smaller than the data.

Aloisius22 hours ago | parent | next

That seems easy enough to solve. Simply record the index and length in pi of the index and length in pi.

awesome_dude22 hours ago | root | parent

See also: cofixpoints of co-algebras

jesuslop17 hours ago | root | parent

check mate

jastr19 hours ago | parent | next

Back in college, I thought I could compress my phone number by telling people its index in pi, but my 7 digit phone number is at an 8 digit index.

I didn’t have the compute to find my 10 digit number with the area code.

xavortm15 hours ago | root | parent

HEX should've solved for char length?

mondrian21 hours ago | parent | next

The index of your 20 line file is <20TB number>

russfink20 hours ago | root | parent

Unless, in turn, you locate the index itself in pi at a much smaller index. And so on...

Find k candidate indices for your data, then locate each of them. If the smallest one is a significantly smaller index space, repeat.

Galanwe8 hours ago | root | parent | next

It's recursive as well, you now need to store how many levels of indirection of indices you had to resolve, which will in turn take 20TB to store, unless you store that in pi as well, which in turn...

akoboldfrying19 hours ago | root | parent

Can't tell if you're in on the joke or not, but for anyone who is genuinely wondering whether this might work: Consider that there are at most 256 different indexes that could be represented by a 1-byte index value, but if you're trying to store 9 bits of data, there are already 512 different possible things it could be that each need to be represented by a different index value, otherwise you won't be able to tell them apart. Those pigeons aren't gonna fit.

jonhohle14 hours ago | root | parent

That’s what variable length encoding is for!

12_throw_away22 hours ago | parent | next

yes I believe that's the joke

jwpapi22 hours ago | root | parent

He’s aware, he just added some curious information.

hatthew20 hours ago | parent | next

TFA addresses this

> Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found.

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

ithkuil19 hours ago | root | parent

Why stop at bytes? Let's split it in individual bits and then look up the bits in pi!

But Pi's binary expansion is not very practical for this purpose, since it's 11.0010...

OTOH. e is 10.1011...

Let's stick to fractional digits (the ones right of the binary point) at index 0 we have 1 and at index 1 we have 0.

So, to encode a stream of bytes so that each bit is encoded as the index of that bit in the e, all you need to do is to xor it with 0xFF

nvader18 hours ago | root | parent | next

Hang on hang on let me write a CUDA kernel for this. This is going to be really huge.

hatthew18 hours ago | root | parent

genius

account4210 hours ago | parent | next

That just means you'll be creating even more valuable metadata to store your files. Win-win.

bandrami11 hours ago | parent | next

At least as of 15 years ago when I was in grad school that remained an open conjecture.

liamYC20 hours ago | parent

Point taken about the index potentially being really long. Why would the length be longer than the data? Don’t you need to find the right sequence?

loading story #48490682

gowld18 hours ago | root | parent

For a given length of data, considering all possible data of that length, it's impossible for the median length to be shorter than the data length. There aren't enough strings of that length that early in the data.

#visit	13,745,404
#session	74,665
#live-session	0