Hacker News new | past | comments | ask | show | jobs | submit

Monorepo – Our Experience

https://ente.io/blog/monorepo-retrospective/
Every monorepo I've ever met (n=3) has some kind of radioactive DMZ that everybody is afraid to touch because it's not clear who owns it but it is clear from its quality that you don't want to be the last person who touched it because then maybe somebody will think that you own it. It's usually called "core" or somesuch.

Separate repos for each team means that when two teams own components that need to interact, they have to expose a "public" interface to the other team--which is the kind of disciplined engineering work that we should be striving for. The monorepo-alternative is that you solve it in the DMZ where it feels less like engineering and more like some kind of multiparty political endeavor where PR reviewers of dubious stakeholder status are using the exercise to further agendas which are unrelated to the feature except that it somehow proves them right about whatever architectural point is recently contentious.

Plus, it's always harder to remove something from the DMZ than to add it, so it's always growing and there's this sort of gravitational attractor which, eventually starts warping time such that PR's take longer to merge the closer they are to it.

Better to just do the "hard" work of maintaining versioned interfaces with documented compatibility (backed by tests). You can always decide to collapse your codebase into a black hole later--but once you start on that path you may never escape.

Since we are indulging in generalizations from our past. With separate repos you end up with 10 "cores" that are radioctive DMZ's everybody is afraid to touch. And those "disciplined" public API's will be universally hated by everyone who consumes them.

Neither a monorepo nor separate repos will result in people being disciplined. If you already have the discipline to do separate repositories correctly then you'll be fine with a monorepo.

So I guess it's six on one hand, half dozen in the other.

loading story #42071665
loading story #42071587
loading story #42071606

    > Moving to a monorepo didn't change much, and what minor changes it made have been positive.
I'm not sure that this statement in the summary jives with this statement from the next section:

    > In the previous, separate repository world, this would've been four separate pull requests in four separate repositories, and with comments linking them together for posterity.
    > 
    > Now, it is a single one. Easy to review, easy to merge, easy to revert.
IMO, this is a huge quality of life improvement and prevents a lot of mistakes from not having the right revision synced down across different repos. This alone is a HUGE improvement where a dev doesn't accidentally end up with one repo in this branch and forgot to pull this other repo at the same branch and get weird issues due to this basic hassle.

When I've encountered this, we've had to use another repo to keep scripts that managed this. But this was also sometimes problematic because each developer's setup had to be identical on their local file system (for the script to work) or we had to each create a config file pointing to where each repo lived.

This also impacts tracking down bugs and regression analysis; this is much easier to manage in a mono-repo setup because you can get everything at the same revision instead of managing synchronization of multiple repos to figure out where something broke.

loading story #42067413
My only counter argument here, is when those 4 things deploy independently. Sometimes, people will get tricked into thinking a code change is atomic because it is in one commit, when it will lead to a mixed fleet because of deployment realities. In that world, having them separate is easier to work with, as you may have to revert one of the deployments separately from the others.
loading story #42069461
If the version numbers of all services built from the PR are identical, you at least have a pretty clear trail to figuring out WTF happened.

Even with a few services, we saw some pretty crunchy issues with people not understanding that service A had version 1.3.1234 of a module and Service B had version 1.3.1245 and that was enough skew to cause problems.

Distinct repos tend to have distinct builds, and sooner or later one of the ten you're building will glitch out and have to be run twice, or the trigger will fail and it won't build at all until the subsequent merge, and having numbers that are close results in a false sense of confidence.

loading story #42069599
loading story #42070122
loading story #42069618
loading story #42067533
loading story #42066140
loading story #42064679
loading story #42070436
loading story #42071662
loading story #42069502
Without indicating my personal feelings on monorepo vs polyrepo, or expressing any thoughts about the experience shared here, I would like to point out that open-source projects have different and sometimes conflicting needs compared to proprietary closed-source projects. The best solution for one is sometimes the extreme opposite for the other.

In particular many build pipelines involving private sources or artifacts become drastically more complicated than their those of publicly available counterparts.

loading story #42066976
loading story #42071365
To me, monorepo vs multi-repo is not about the code organization, but about the deployment strategy. My rule is that there should be a 1:1 relation between a repository and a release/deployment.

If you do one big monolithic deploy, one big monorepo is ideal. (Also, to be clear, this is separate from microservice vs monolithic app: your monolithic deploy can be made up of as many different applications/services/lambdas/databases as makes sense). You don't have to worry about cross-compatibility between parts of your code, because there's never a state where you can deploy something incompatible, because it all deploys at once. A single PR makes all the changes in one shot.

The other rule I have is that if you want to have individual repos with individual deployments, they must be both forward- and backwards-compatible for long enough that you never need to do a coordinated deploy (deploying two at once, where everything is broken in between). If you have to do coordinated deploys, you really have a monolith that's just masquerading as something more sophisticated, and you've given up the biggest benefits of both models (simplicity of mono, independence of multi).

Consider what happens with a monorepo with parts of it being deployed individually. You can't checkout any specific commit and mirror what's in production. You could make multiple copies of the repo, checkout a different commit on each one, then try to keep in mind which part of which commit is where -- but this is utterly confusing. If you have 5 deployments, you now have 4 copies of any given line of code on your system that are potentially wrong. It becomes very hard to not accidentally break compatibility.

TL;DR: Figure out your deployment strategy, then make your repository structure mirror that.

It doesn't have to be that way.

You can have a mono-repo and deploy different parts of the repo as different services.

You can have a mono-repo with a React SPA and a backend service in Go. If you fix some UI bug with a button in the React SPA, why would you also deploy the backend?

loading story #42066182
You wouldn't, but making a repo collection into a mono-repo means your mono-deploy needs to be split into a multi-maybe-deploy.

As always, complexity merely moves around when squeezed, and making commits/PRs easier means something else, somewhere else gets less easy.

It is something that can be made better of course, having your CI and CD be a bit smarter and more modular means you can now do selective builds based on what was actually changed, and selective releases based on what you actually want to release (not merely what was in the repo at a commit, or whatever was built).

But all of that needs to be constructed too, just merging some repos into one doesn't do that.

If you don't deploy in tandem, you need to test forwards & backwards compatibility. That's tough with either a monorepo or separate repos, but arguably it'd be simple with separate repos.
loading story #42066300
This mirrors my own experience in the SaaS world. Anytime things move towards multiple artifacts/pipelines in one repo; trying to understand what change existed where and when seems to always become very difficult.

Of course the multirepo approach means you do this dance a lot more: - Create a change with backwards compatibility and tombstones (e.g. logs for when backward compatibility is used) - Update upstream systems to the new change - Remove backwards compatibility and pray you don't have a low frequency upstream service interaction you didn't know about

While the dance can be a pain - it does follow a more iterative approach with reduced blast radiuses (albeit many more of them). But, all in all, an acceptable tradeoff.

Maybe if I had more familiarity in mature tooling around monorepos I might be more interested in them. But alas not a bridge I have crossed, or am pushed to do so just at the moment.

Ok, but the more interesting part - how did you solve the CI/CD part and how does it compare to a multirepo?
loading story #42064903
loading story #42064722
loading story #42065313
loading story #42071705
loading story #42067638
I think the big issue around monorepo is when a company puts completely different projects together inside a single repo.

In this article almost everything makes sense to me (because that's what I have been doing most of my career) but they put their OTP app inside which suddenly makes no sense. And you can see the problem in the CI they have dedicated files just for this App and probably very few common code with the rest.

IMO you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library.

loading story #42065000
loading story #42069443
loading story #42069823
{"deleted":true,"id":42064514,"parent":42062074,"time":1730909688,"type":"comment"}
loading story #42070241
We're transitioning from a SVN monorepo to Git. We've considered doing a kind of best-of-both-worlds approach.

Some core stuff into separate libraries, consumed as nuget packages by other projects. Those libraries and other standalone projects in separate repos.

Then a "monorepo" for our main product, where individual projects for integrations etc will reference non-nuget libraries directly.

That is, tightly coupled code goes into the monorepo, the rest in separate repos.

Haven't taken the plunge just yet tho, so not sure how well it'll actually work out.

monorepos are appropriate for a single project with many sub parts but one or two artifacts on any given release build. But they fall apart when you have multiple products in the monorepo, each with different release schedules.

As soon as you add a second separate product that uses a different subset of any code in the repo, you should consider breaking up the monorepo. If the code is "a bunch of libraries" and "one or more end user products" it becomes even more imperative to consider breaking down stuff..

Having worked on monorepos where there are 30+ artifacts, multiple ongoing projects that each pull the monorepo in to different incompatible versions, and all of which have their own lifetime and their own release cycle - monorepo is the antithesis of a good idea.

loading story #42065786
loading story #42067992
Some thoughts:

1) Comparing a photo storage app to the Linux kernel doesn't make much sense. Just because a much bigger project in an entirely different (and more complex) domain uses monorepos, doesn't mean you should too.

2) What the hell is a monorepo? I feel dumb for asking the question, and I feel like I missed the boat on understanding it, because no one defines it anymore. Yet I feel like every mention of monorepo is highly dependent on the context the word is used in. Does it just mean a single version-controlled repository of code?

3) Can these issues with sync'ing repos be solved with better use of `git submodule`? It seems to be designed exactly for this purpose. The author says "submodules are irritating" a couple times, but doesn't explain what exactly is wrong with them. They seem like a great solution to me, but I also only recently started using them in a side project

loading story #42066078
loading story #42066192