Mistral AI Releases Forge

685pember | 22 hours ago | 174 | HN

I like Mistral, it hits the exact sweet spot between cost and my data staying in the EU, withouth a significant drop in quality, but man are their model naming conventions confusing af. They mention they have a model called Devstral 2, which is neither Codestral nor Devestral. I want to use it, but the api only lists devstral-2512, devstral-latest, devstral-medium-latest, devstral-medium-2507, devstral-small, devstral-small-2507.

I think, devstral-latest should be it, no? So I write to support and get an answer 12 hours later that says oh, no, devstral 2 is definetely called devstral 2 and then a page of instructions on how to set it up in Intellij... generated with AI. The screens it is refering to don't exist and never did.

loading story #47423807

loading story #47423485

loading story #47423635

ogou13 hours ago | parent | next

Don't sleep on Mistral. Highly underrated as a general service LLM. Cheaper, too. Their emphasis on bespoke modelling over generalized megaliths will pay off. There are all kinds of specialized datasets and restricted access stores that can benefit from their approach. Especially in highly regulated EU.

Not everyone is obsessed with code generation. There is a whole world out there.

loading story #47422839

loading story #47422226

loading story #47422661

loading story #47422184

loading story #47422066

spiderfarmer11 hours ago | parent | next

Yes, since it's not American, it will be the de-facto choice for most big European companies.

loading story #47422804

umeridrisi11 hours ago | parent

Is this the best Grok alternative?

spiderfarmer11 hours ago | root | parent

Any model is.

mark_l_watson19 hours ago | parent | next

I am rooting for Mistral with their different approach: not really competing on the largest and advanced models, instead doing custom engineering for customers and generally serving the needs of EU customers.

loading story #47421971

jerrygoyal16 hours ago | parent | next

their ocr model is goated

loading story #47422598

loading story #47421410

loading story #47421477

loading story #47422085

loading story #47420606

doctorpangloss14 hours ago | parent

first, there was .ai

next, it sounds like it's going to be .eu

but what about ai.eu

upghost16 hours ago | parent | next

> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.

> Post-training methods allow teams to refine model behavior for specific tasks and environments.

How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?

There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.

loading story #47421255

loading story #47421373

loading story #47423606

roxolotl19 hours ago | parent | next

Mistral has been releasing some cool stuff. Definitively behind on frontier models but they are working a different angle. Was just talking at work about how hard model training is for a small company so we’d probably never do it. But with tools like this, and the new unsloth release, training feels more in reach.

ryeguy_2417 hours ago | parent | next

How many proprietary use cases truly need pre-training or even fine-tuning as opposed to RAG approach? And at what point does it make sense to pre-train/fine tune? Curious.

loading story #47421535

loading story #47420753

dmix17 hours ago | parent | next

This is definitely the smart path for making $$ in AI. I noticed MongoDB is also going into this market with https://www.voyageai.com/ targeting business RAG applications and offering consulting for company-specific models.

loading story #47422222

loading story #47424687

csunoser19 hours ago | parent | next

Huh. I initially thought this is just another finetuning end point. But apparently they are partnering up with customers on the pretraining side as well. But RL as well? Jeez RL env are really hard to get right. Best wishes I guess.

loading story #47423555

loading story #47425141

loading story #47422718

andai17 hours ago | parent | next

They mention pretraining too, which surprises me. I thought that was prohibitively expensive?

It's feasible for small models but, I thought small models were not reliable for factual information?

loading story #47421068

loading story #47428929

loading story #47422291

loading story #47423853

hermit_dev16 hours ago | parent | next

The future of AI is specialization, not just achieving benevolent knowledge as fast as we can at the expense of everything and everyone along the way. I appreciate and applaud this approach. I am looking into a similar product myself. Good stuff.

rorylawless17 hours ago | parent | next

The fine tuning endpoint is deprecated according to the API docs. Is this the replacement?

https://docs.mistral.ai/api/endpoint/deprecated/fine-tuning

loading story #47420930

loading story #47423426

loading story #47423101

loading story #47422272

loading story #47423637

loading story #47422306

loading story #47423219

loading story #47424956

loading story #47423526