Hacker News new | past | comments | ask | show | jobs | submit

Ask HN: Is anyone doing anything cool with tiny language models?

loading story #42790190
loading story #42786841
I have ollama responding to SMS spam texts. I told it to feign interest in whatever the spammer is selling/buying. Each number gets its own persona, like a millennial gymbro or 19th century British gentleman.

http://files.widloski.com/image10%20(1).png

http://files.widloski.com/image11.png

loading story #42787151
loading story #42787781
loading story #42789860
loading story #42795730
This is fantastic. How have your hooked up a mobile number to the llm?
Android app that forwards to a Python service on remote workstation over MQTT. I can make a Show HN if people are interested.
I’d love to see that. Could you simulate iMessage?
Yes it’s possible, but it’s not something you can easily scale.

I had a similar project a few years back that used OSX automations and Shortcuts and Python to send a message everyday to a friend. It required you to be signed in to iMessage on your MacBook.

Than was a send operation, the reading of replies is not something I implemented, but I know there is a file somewhere that holds a history of your recent iMessages. So you would have to parse it on file update and that should give you the read operation so you can have a conversation.

Very doable in a few hours unless something dramatic changed with how the messages apps works within the last few years.

loading story #42791450
If you mean hook this into iMessage, I don't know. I'm willing to bet it's way harder though because Apple
loading story #42790394
Why MQTT over HTTP for a low volume, small scale integration?
I’m not OP, but I would hazard a guess that those are the tools that OP has at hand.
loading story #42802009
Yes, I'd be interested in that!
loading story #42791742
loading story #42786967
loading story #42786974
loading story #42796084
loading story #42794672
loading story #42795824
loading story #42787231
loading story #42789419
I have a mini PC with an n100 CPU connected to a small 7" monitor sitting on my desk, under the regular PC. I have llama 3b (q4) generating endless stories in different genres and styles. It's fun to glance over at it and read whatever it's in the middle of making. I gave llama.cpp one CPU core and it generates slow enough to just read at a normal pace, and the CPU fans don't go nuts. Totally not productive or really useful but I like it.
loading story #42786114
loading story #42785325
loading story #42786081
loading story #42785192
loading story #42785253
loading story #42787856
loading story #42786586
loading story #42788468
loading story #42785739
loading story #42785938
loading story #42784922
Microsoft published a paper on their FLAME model (60M parameters) for Excel formula repair/completion which outperformed much larger models (>100B parameters).

https://arxiv.org/abs/2301.13779

loading story #42785415
That paper is from over a year ago, and it compared against codex-davinci... which was basically GPT-3, from what I understand. Saying >100B makes it sound a lot more impressive than it is in today's context... 100B models today are a lot more capable. The researchers also compared against a couple of other ancient(/irrelevant today), small models that don't give me much insight.

FLAME seems like a fun little model, and 60M is truly tiny compared to other LLMs, but I have no idea how good it is in today's context, and it doesn't seem like they ever released it.

I would like to disagree with its being irrelevant. If anything, the 100B models are irrelevant in the context and should be seen as a "fun inclusion" rather than a serious addition worth comparing against. It out-performing a 100B model at the time becomes a fun bragging point, but it's not the core value of the method or paper.

Running a prompt against every single cell of a 10k row document was never gonna happen with a large model. Even using a transformer model architecture in the first place can be seen as ludicrous overkill but feasible on modern machines.

So I'd say the paper is very relevant, and the top commenter in this very thread demonstrated their own homegrown version with a very nice use-case (paper abstract and title sorting for making a summary paper)

> Running a prompt against every single cell of a 10k row document was never gonna happen with a large model

That isn’t the main point of FLAME, as I understood it. The main point was to help you when you’re editing a particular cell. codex-davinci was used for real time Copilot tab completions for a long time, I believe, and editing within a single formula in a spreadsheet is far less demanding than editing code in a large document.

After I posted my original comment, I realized I should have pointed out that I’m fairly sure we have 8B models that handily outperform codex-davinci these days… further driving home how irrelevant the claim of “>100B” was here (not talking about the paper). Plus, an off the shelf model like Qwen2.5-0.5B (a 494M model) could probably be fine tuned to compete with (or dominate) FLAME if you had access to the FLAME training data — there is probably no need to train a model from scratch, and a 0.5B model can easily run on any computer that can run the current version of Excel.

You may disagree, but my point was that claiming a 60M model outperforms a 100B model just means something entirely different today. Putting that in the original comment higher in the thread creates confusion, not clarity, since the models in question are very bad compared to what exists now. No one had clarified that the paper was over a year old until I commented… and FLAME was being tested against models that seemed to be over a year old even when the paper was published. I don’t understand why the researchers were testing against such old models even back then.

But I feel we're going back full circle. These small models are not generalist, thus not really LLMs at least in terms of objective. Recently there has been a rise of "specialized" models that provide lots of values, but that's not why we were sold on LLMs.
loading story #42785764
Specialized models work much better still for most stuff. Really we need an LLM to understand the input and then hand it off to a specialized model that actually provides good results.
I think playing word games about what really counts as an LLM is a losing battle. It has become a marketing term, mostly. It’s better to have a functionalist point of view of “what can this thing do”.
loading story #42785270
https://gophersignal.com – I built GopherSignal!

It's a lightweight tool that summarizes Hacker News articles. For example, here’s what it outputs for this very post, "Ask HN: Is anyone doing anything cool with tiny language models?":

"A user inquires about the use of tiny language models for interesting applications, such as spam filtering and cookie notice detection. A developer shares their experience with using Ollama to respond to SMS spam with unique personas, like a millennial gymbro or a 19th-century British gentleman. Another user highlights the effectiveness of 3B and 7B language models for cookie notice detection, with decent performance achieved through prompt engineering."

I originally used LLaMA 3:Instruct for the backend, which performs much better, but recently started experimenting with the smaller LLaMA 3.2:1B model.

It’s been cool seeing other people’s ideas too. Curious—does anyone have suggestions for small models that are good for summaries?

Feel free to check it out or make changes: https://github.com/k-zehnder/gophersignal

loading story #42791453
loading story #42809880
loading story #42819063
loading story #42785400
loading story #42784724
I've made a tiny ~1m parameter model that can generate random Magic the Gathering cards that is largely based on Karpathy's nanogpt with a few more features added on top.

I don't have a pre-trained model to share but you can make one yourself from the git repo, assuming you have an apple silicon mac.

https://github.com/jlwitthuhn/TCGGPT

loading story #42786422
loading story #42786354
loading story #42793335
loading story #42785570
loading story #42784612
loading story #42792994
loading story #42788651
loading story #42785568
loading story #42787549
loading story #42785183
loading story #42786889
loading story #42786641
loading story #42786957
loading story #42784970
loading story #42808082
loading story #42786920
loading story #42787155
loading story #42795200
loading story #42790521
loading story #42791298
loading story #42785022
loading story #42789518
loading story #42786676
loading story #42786928
loading story #42788414
loading story #42787162
loading story #42785307
loading story #42791813
loading story #42788471
loading story #42789545
loading story #42791117
loading story #42791455
loading story #42791788
loading story #42788807
loading story #42794848
loading story #42793762
I had an LLM create a playlist for me.

I’m tired of the bad playlists I get from algorithms, so I made a specific playlist with an Llama2 based on several songs I like. I started with 50, removed any I didn’t like, and added more to fill in the spaces. The small models were pretty good at this. Now I have a decent fixed playlist. It does get “tired” after a few weeks and I need to add more to it. I’ve never been able to do this myself with more than a dozen songs.

loading story #42794243
loading story #42792207
Interesting! I've sadly found more capable models to really fail on music recommendations for me.
loading story #42790316
loading story #42811735
loading story #42794687
loading story #42791921
loading story #42790747
loading story #42787661
loading story #42794022
loading story #42787826
loading story #42788343
loading story #42813493
loading story #42790226
loading story #42794223
loading story #42790611
loading story #42786214
loading story #42784983
loading story #42802097
loading story #42790341
loading story #42790866