Hacker News new | past | comments | ask | show | jobs | submit

Mistral AI Releases Forge

https://mistral.ai/news/forge
I like Mistral, it hits the exact sweet spot between cost and my data staying in the EU, withouth a significant drop in quality, but man are their model naming conventions confusing af. They mention they have a model called Devstral 2, which is neither Codestral nor Devestral. I want to use it, but the api only lists devstral-2512, devstral-latest, devstral-medium-latest, devstral-medium-2507, devstral-small, devstral-small-2507.

I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.

loading story #47423807
loading story #47423485
loading story #47423635
Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.

Not everyone is obsessed with code generation. There is a whole world out there.

loading story #47422839
loading story #47422226
loading story #47422661
loading story #47422184
loading story #47422066
Yes, since it's not American, it will be the de-facto choice for most big European companies.
loading story #47422804
Is this the best Grok alternative?
I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.
loading story #47421971
their ocr model is goated
loading story #47422598
loading story #47421410
loading story #47421477
loading story #47422085
loading story #47420606
first, there was .ai

next, it sounds like it's going to be .eu

but what about ai.eu

> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.

> Post-training methods allow teams to refine model behavior for specific tasks and environments.

How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?

There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.

loading story #47421255
loading story #47421373
loading story #47423606
Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.
How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.
loading story #47421535
loading story #47420753
This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.
loading story #47422222
loading story #47424687
Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.
loading story #47423555
loading story #47425141
loading story #47422718
They mention pretraining too, which surprises me. I thought that was prohibitively expensive?

It's feasible for small models but, I thought small models were not reliable for factual information?

loading story #47421068
loading story #47428929
loading story #47422291
loading story #47423853
The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.
The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?

https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning

loading story #47420930
loading story #47423426
loading story #47423101
loading story #47422272
loading story #47423637
loading story #47422306
loading story #47423219
loading story #47424956
loading story #47423526
How does this compare to fine tuning?
loading story #47423488
loading story #47421925
Id training or FT > context? Anyone have experience.

Is it possible to retrain daily or hourly as info changes?

loading story #47424552
loading story #47429606
loading story #47428379
loading story #47423883
loading story #47422043
loading story #47426616
loading story #47428324