Story Detail of id 48409486 | Liveview Hacker News

spacebacon12 hours ago | on: Fine-tuning an LLM to write docs like it's 1995

Now do it without the fine tuning.

The HF zool4nd3r demo may be useful

Your method appears to be similar to LoRA but simply less expressive. Some kind of manipulation to layers 7, 14, and 21. Did you compare with other layers? This is obviously extremely specific to a particular backbone.

Also your documents use a ton of nonstandard jargon which only serve to confuse laypeople and annoy anyone who is familiar with ML. Saying your change adds “semiotic awareness” is meaningless when your experiments claim only marginal improvements. Clearly the model had most of the capability before.

More generally, who is it for? People who have expertise in ML are not going to take it seriously. People who don’t?

loading story #48410675

anentropic11 hours ago | parent | next

Tip: neither the "30 second TL;DR" nor the intro paragraph above it really explain to anyone unfamiliar with your (possibly novel?) jargon what it does

janalsncm10 hours ago | root | parent | next

“Semiotic awareness” is not standard ML terminology. The dictionary definition of semiotic simply means “relating to symbols” so it’s a bit grandiose to say you have Qwen “awareness of symbols” when in reality it’s a marginal improvement if even true.

Also to say that a philosopher that died 100 years ago inspired a new attention head is another instance of GPT off his rocker again. You don’t need MAH to contextualize “freedom” in a sentence. Attention already does that.

spacebacon11 hours ago | root | parent

Thank you, I would appreciate additional feedback on how I can improve that?

Edit: its not GPT nor off rocker. This repo empirically proved computational semiotics with the reference to C.S. Peirce, Paul Kockelman, and many other respected contemporary semioticians.

loading story #48410411

loading story #48410401

nextaccountic10 hours ago | parent

How does this helps with making a LLM write in a particular style present in a large corpus? Is there a training step? Or does SRT can use the raw data as is? (seems unfeasible)

Also is SRT really suitable for style transfer?

I mean this seems to be another network overlaid on top of the LLM steering it, but it needs some target to determine whether the underlying LLM drifted away from it

loading story #48410690

#visit	13,589,291
#session	74,665
#live-session	0