In academia the cost for running computers and software deployments is usually greatly underestimated (I speak from experience). If you are a for-profit company and compute the full economic costs of running a database deployment including training, devops, alerting, 24/7 maintenance, upgrades, backups and everything you easily come to some $60000 per year.
Managed services can beat this price by a huge margin, although they seem expensive at first glance.
(ArangoDB developer here) MongoDB is a document store, ArangoDB is multi-model, so you have graphs and key/value and search as a additional benefits. The query language AQL is also a huge plus. Price comparisons are always hard, but for similar deployments Oasis will be a bit cheaper. One really has to look at the details here, in particular with respect to sharding and resilience.
Disclaimer: I am one of the core developers of ArangoDB and I work for the company ArangoDB.
Yes, ArangoDB is young (6 years) in comparison to PostgreSQL (30 years).
Yes, PostgreSQL is a phantastic database with an amazing open source community and is not going away any time soon.
Yes, PostgreSQL is a good choice for a project in which you need a relational single server database.
However, ArangoDB actually has a different value proposition, so in a sense, it is not a direct competitor.
ArangoDB is native multi-model, which means it is a document store (JSON), a graph database and a key/value store, all in one engine and with a uniform query language which supports all three data models and allows to mix them, even in a single query.
Furthermore, ArangoDB is designed as a fault-tolerant distributed and scalable system.
In addition it is extensible with user defined JavaScript code running in a sandbox in the database server.
Finally, we do our best to make this distributed system devops friendly with good tooling and k8s integration.
Last but not least, ArangoDB is backed by a company which offers professional support.
Therefore any well informed decision for a project needs to look at the value propositions and capabilities and not only at the age and experience, which is of course a big argument, since people are - rightfully so - conservative with their databases.
>is designed as a fault-tolerant distributed and scalable system
What is the consistency model and have you validated that it actually works as designed (for example with Jepsen)? I didn't find anything detailed on your website.
As explained in the article: The "library part" of our executables is very small in comparison to the executable size. Furthermore, the memory usage of the database itself is usually much greater than the size of the executable itself. Finally, one rarely deploys multiple instances of an ArangoDB server on the same machine, so savings by shared libraries are also not that great.
Therefore the "waste" is very minor and the advantages outweigh the slight increase in memory usage.
After compilation the executables have symbols and debug information. For the release we simply strip it. It is easy to get/provide versions with debugging information. We plan to provide deb and rpm packages containing these and provide separate debug packages. This works but is not yet published
Author of posted article here: thanks for the additional pointers. It seems that graphistry excels at visualization. Essentially, your offering confirms the main story of the article: make more out of your (graph) data by extracting it from Hadoop to a different tool.
And obviously, one should use the right tool for the purpose
. I think graphistry is a good choice for graph visualization, graph databases like ArangoDB or Neo4j will be good at ad hoc traversals. And multi-model databases like ArangoDB or OrientDB will be good at a wide range of ad hoc queries. Anyway, thanks again for the pointers.
Yep. Maybe the observation is (1) data has gravity -- it was originally in another non-graph-specific DB -- and (2) the graph structure part is normally small. So we indeed see a lot of extraction into easier-to-use systems.
The nuance being... with stuff like data science notebooks and pandas, the people skilled enough to do extraction are also skilled enough that it's easier to just use pandas. The exception is repeat work or when it is for regular analysts. Friendly query languages like Neo4j's Cypher helps there. Not sure what Arango supports... Gremlin? Proprietary?
Graphistry's environment is agnostic, and _not_ a database, so it'd be wrong of me to advocate teams drop their system of record and use just us ;-) We ended up building a visual "playbook" investigation environment to help teams streamline these scenarios. They run visual playbooks against their legacy db (splunk, elastic, sql, ...) for faux-graph queries, or their new graph db for deeper ones (e.g., path queries). So we're more of the system of record + superpowers for your investigations, kind of like a smarter version of what Tableau/Looker do for SQL.
Does anybody reading this have any experience with the performance of AWS Neptune? I would be very interested to hear about the performance of deep graph traversals on large sharded graphs.
NoSQL is a very wide field. There are lots of special cases in which certain NoSQL data stores can help a lot. One of the points the posted article makes is that native multi-model databases help you not to sacrifice relational and yet reap some of the benefits certain NoSQL data models (like graph) can provide.
In discrete mathematics this happens a lot. Group orders, sizee of conjugacy classes, semigroup orders, numbers of isomorphism classes, character degrees, etc.