Javascript is not enabled. This site can still works but it'll be more interactive when javascript is enabled.
loading...
Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
ACCount37
18 hours ago
|
on: Do transformers need three projections? Systematic study of QKV variants
I wonder if some of those synthetics that specifically burn in attention inductive bias could help there - i.e. by getting attention to converge faster than it normally would?
reply