Surely the loss in precision of a 32KB LUT for double precision asin() would be unacceptable?
By interpolating between values you can get excellent results with LUTs much smaller than 32KB. Will it be faster than the computation from op, that I don't know.
I experimented a bit with the code. Various tables with different datatypes. There is enough noise from the Monte Carlo to not make a difference if you use smaller data types than double or float. Even dropping interpolation worked fine, and got the speed to be on par with the best in the article, but not faster.
loading story #47338125
loading story #47337485