Javascript is not enabled. This site can still works but it'll be more interactive when javascript is enabled.
loading...
Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
WithinReason
14 hours ago
|
on: Do transformers need three projections? Systematic study of QKV variants
Not every one can afford millions to publish a paper
reply
spindump8930
4 hours ago
|
parent
That's why you do several small and medium scale tests, fit a curve, and ideally show that the trend persists at several scales. Not a single large or medium run - see the other comments down thread for example sizes.
reply