Ask HN: Is anyone doing anything cool with tiny language models?

684prettyblocks | 2 weeks ago | 342 | HN

loading story #42790190

loading story #42786841

I have ollama responding to SMS spam texts. I told it to feign interest in whatever the spammer is selling/buying. Each number gets its own persona, like a millennial gymbro or 19th century British gentleman.

http://files.widloski.com/image10%20(1).png

http://files.widloski.com/image11.png

loading story #42787151

loading story #42787781

loading story #42789860

loading story #42795730

RVuRnvbM2e2 weeks ago | parent | next

This is fantastic. How have your hooked up a mobile number to the llm?

Evidlo2 weeks ago | root | parent | next

Android app that forwards to a Python service on remote workstation over MQTT. I can make a Show HN if people are interested.

SuperHeavy2561 week ago | root | parent | next

I am so SO interested, please make a Show HN

Evidlo1 week ago | root | parent

https://news.ycombinator.com/item?id=42796496

gaudystead1 week ago | root | parent

Sweeeeet, thank you! :)

deadbabe2 weeks ago | root | parent | next

I’d love to see that. Could you simulate iMessage?

great_psy2 weeks ago | root | parent | next

Yes it’s possible, but it’s not something you can easily scale.

I had a similar project a few years back that used OSX automations and Shortcuts and Python to send a message everyday to a friend. It required you to be signed in to iMessage on your MacBook.

Than was a send operation, the reading of replies is not something I implemented, but I know there is a file somewhere that holds a history of your recent iMessages. So you would have to parse it on file update and that should give you the read operation so you can have a conversation.

Very doable in a few hours unless something dramatic changed with how the messages apps works within the last few years.

loading story #42791450

Evidlo2 weeks ago | root | parent

If you mean hook this into iMessage, I don't know. I'm willing to bet it's way harder though because Apple

loading story #42790394

potamic1 week ago | root | parent | next

Why MQTT over HTTP for a low volume, small scale integration?

c0wb0yc0d3r1 week ago | root | parent | next

I’m not OP, but I would hazard a guess that those are the tools that OP has at hand.

loading story #42802009

dkga2 weeks ago | root | parent | next

Yes, I'd be interested in that!

loading story #42791742

loading story #42786967

loading story #42786974

loading story #42796084

loading story #42794672

loading story #42795824

loading story #42787231

loading story #42789419

behohippy2 weeks ago | parent | next

I have a mini PC with an n100 CPU connected to a small 7" monitor sitting on my desk, under the regular PC. I have llama 3b (q4) generating endless stories in different genres and styles. It's fun to glance over at it and read whatever it's in the middle of making. I gave llama.cpp one CPU core and it generates slow enough to just read at a normal pace, and the CPU fans don't go nuts. Totally not productive or really useful but I like it.

loading story #42786114

loading story #42785325

loading story #42786081

loading story #42785192

loading story #42785253

loading story #42787856

loading story #42786586

loading story #42788468

loading story #42785739

loading story #42785938

loading story #42784922

azhenley2 weeks ago | parent | next

Microsoft published a paper on their FLAME model (60M parameters) for Excel formula repair/completion which outperformed much larger models (>100B parameters).

https://arxiv.org/abs/2301.13779

loading story #42785415

coder5432 weeks ago | parent | next

That paper is from over a year ago, and it compared against codex-davinci... which was basically GPT-3, from what I understand. Saying >100B makes it sound a lot more impressive than it is in today's context... 100B models today are a lot more capable. The researchers also compared against a couple of other ancient(/irrelevant today), small models that don't give me much insight.

FLAME seems like a fun little model, and 60M is truly tiny compared to other LLMs, but I have no idea how good it is in today's context, and it doesn't seem like they ever released it.

aDyslecticCrow1 week ago | root | parent

I would like to disagree with its being irrelevant. If anything, the 100B models are irrelevant in the context and should be seen as a "fun inclusion" rather than a serious addition worth comparing against. It out-performing a 100B model at the time becomes a fun bragging point, but it's not the core value of the method or paper.

Running a prompt against every single cell of a 10k row document was never gonna happen with a large model. Even using a transformer model architecture in the first place can be seen as ludicrous overkill but feasible on modern machines.

So I'd say the paper is very relevant, and the top commenter in this very thread demonstrated their own homegrown version with a very nice use-case (paper abstract and title sorting for making a summary paper)

coder5431 week ago | root | parent

> Running a prompt against every single cell of a 10k row document was never gonna happen with a large model

That isn’t the main point of FLAME, as I understood it. The main point was to help you when you’re editing a particular cell. codex-davinci was used for real time Copilot tab completions for a long time, I believe, and editing within a single formula in a spreadsheet is far less demanding than editing code in a large document.

After I posted my original comment, I realized I should have pointed out that I’m fairly sure we have 8B models that handily outperform codex-davinci these days… further driving home how irrelevant the claim of “>100B” was here (not talking about the paper). Plus, an off the shelf model like Qwen2.5-0.5B (a 494M model) could probably be fine tuned to compete with (or dominate) FLAME if you had access to the FLAME training data — there is probably no need to train a model from scratch, and a 0.5B model can easily run on any computer that can run the current version of Excel.

You may disagree, but my point was that claiming a 60M model outperforms a 100B model just means something entirely different today. Putting that in the original comment higher in the thread creates confusion, not clarity, since the models in question are very bad compared to what exists now. No one had clarified that the paper was over a year old until I commented… and FLAME was being tested against models that seemed to be over a year old even when the paper was published. I don’t understand why the researchers were testing against such old models even back then.

3abiton2 weeks ago | parent | next

But I feel we're going back full circle. These small models are not generalist, thus not really LLMs at least in terms of objective. Recently there has been a rise of "specialized" models that provide lots of values, but that's not why we were sold on LLMs.

loading story #42785764

Suppafly2 weeks ago | root | parent | next

Specialized models work much better still for most stuff. Really we need an LLM to understand the input and then hand it off to a specialized model that actually provides good results.

janalsncm2 weeks ago | root | parent

I think playing word games about what really counts as an LLM is a losing battle. It has become a marketing term, mostly. It’s better to have a functionalist point of view of “what can this thing do”.

loading story #42785270

computers33331 week ago | parent | next

https://gophersignal.com – I built GopherSignal!

It's a lightweight tool that summarizes Hacker News articles. For example, here’s what it outputs for this very post, "Ask HN: Is anyone doing anything cool with tiny language models?":

"A user inquires about the use of tiny language models for interesting applications, such as spam filtering and cookie notice detection. A developer shares their experience with using Ollama to respond to SMS spam with unique personas, like a millennial gymbro or a 19th-century British gentleman. Another user highlights the effectiveness of 3B and 7B language models for cookie notice detection, with decent performance achieved through prompt engineering."

I originally used LLaMA 3:Instruct for the backend, which performs much better, but recently started experimenting with the smaller LLaMA 3.2:1B model.

It’s been cool seeing other people’s ideas too. Curious—does anyone have suggestions for small models that are good for summaries?

Feel free to check it out or make changes: https://github.com/k-zehnder/gophersignal

loading story #42791453

loading story #42809880

loading story #42819063

loading story #42785400

loading story #42784724

jwitthuhn2 weeks ago | parent | next

I've made a tiny ~1m parameter model that can generate random Magic the Gathering cards that is largely based on Karpathy's nanogpt with a few more features added on top.

I don't have a pre-trained model to share but you can make one yourself from the git repo, assuming you have an apple silicon mac.

https://github.com/jlwitthuhn/TCGGPT

loading story #42786422

loading story #42786354

loading story #42793335

loading story #42785570

loading story #42784612

loading story #42792994

loading story #42788651

loading story #42785568

loading story #42787549

loading story #42785183

loading story #42786889

loading story #42786641

loading story #42786957

loading story #42784970

loading story #42808082

loading story #42786920

loading story #42787155

loading story #42795200

loading story #42790521

loading story #42791298

loading story #42785022

loading story #42789518

loading story #42786676

loading story #42786928

loading story #42788414

loading story #42787162

loading story #42785307

loading story #42791813

loading story #42788471

loading story #42789545

loading story #42791117

loading story #42791455

loading story #42791788

loading story #42788807

loading story #42794848

loading story #42793762

codazoda2 weeks ago | parent | next

I had an LLM create a playlist for me.

I’m tired of the bad playlists I get from algorithms, so I made a specific playlist with an Llama2 based on several songs I like. I started with 50, removed any I didn’t like, and added more to fill in the spaces. The small models were pretty good at this. Now I have a decent fixed playlist. It does get “tired” after a few weeks and I need to add more to it. I’ve never been able to do this myself with more than a dozen songs.

loading story #42794243

loading story #42792207

petesergeant2 weeks ago | parent | next

Interesting! I've sadly found more capable models to really fail on music recommendations for me.

loading story #42790316

loading story #42811735

loading story #42794687

loading story #42791921

loading story #42790747

loading story #42787661

loading story #42794022

loading story #42787826

loading story #42788343

loading story #42813493

loading story #42790226

loading story #42794223

loading story #42790611

loading story #42786214

loading story #42784983

loading story #42802097

loading story #42790341

loading story #42790866

#visit	11797921
#session	46108
#live-session	0