Since we own the metrics and the algorithms that we've spent the last year iterating on with our users, we balance between giving engineers the ability to customize our metric algorithms and evaluation techniques, while offering the ability for them to bring it to the cloud for their organization when they're ready.
This brings me to the tools that does have their own metrics and evals. Including us, there's only 3 companies out there that does this to a good extent (excuse me for this one), and we're the only one with a self-served platform such that any open-source user can get the benefit of Confident AI as well.
That's not all the difference, because if you were to compare DeepEval's metrics on more nuance details (which I think is very important), we provide the most customizable metrics out there. This includes researched-backed SOTA LLM-as-a-judge G-Eval for any criteria, and the recently released DAG metric that is a decision-based that is virtually deterministic despite being LLM-evaluated. This means as user's use cases get more and more specific, they can stick with our metrics and benefit from DeepEval's ecosystem as well (metric caching, cost tracking, parallelization, integrated with Pytest for CI/CD, Confident AI, etc)
There's so much more, such as generating synthetic data to get started with testing even if you don't have a prepared test set, red-teaming for safety testing (so not just testing for functionality), but I'm going to stop here for now.