Said it before, will say it again... "MongoDB is the core piece of architectural rot in every single teetering and broken data platform I've worked with."
The fundamental problem is that MongoDB provides almost no stable semantics to build something deterministic and reliable on top of it.
As a guy who works on ACID database internals, I'm appalled that people use MongoDB. You want a document store? Use Postgres. Why on earth would you use a database that makes so little in the way of guarantees about what results you get from it? I think most people have really low load and concurrency, so things seem to work. When things get busier you're in for a world of pain. Look I get that's it's easy to use and easy to get started with, but you're going to pay for all of that later.
> Why on earth would you use a database that makes so little in the way of guarantees about what results you get from it?
Because some people can't stand having to work with SQL,migrations,schema and constraints, it's as simple as that ( That's not my opinion,that's just the rational behind MongoDB). Even if you use Postgres with the Json column type, you still need to write SQL queries and schemas.
In the context of analytics, it might make sense, I'm not a big data analyst, but I've seen MongoDB used to centralize logs.
My company used mongo for years before we got our shit together.
Schmemas were always implicit (until we got our shit together and started defining and enforcing them with Python Schematics).
Migrations were crazy scripts you run in prod or hacks you stick into your code to "transition".
And yes, surprise constrains left and right causing awful anti-patterns. One-character key names to save disk. Hashed values for indexed keys to save memory. Awkward structuring to improve query performance.
The worst part is, we now have tons of important data in these databases and almost no one understands the legacy crazy app logic that makes them tick.
> used mongo for years before we got our shit together.
That's actually a legit use case. Use MongoDB while you get your shit together. I use global variables while I'm noodling around in code. Eventually I refactor.
I think this is a recipe for disaster. First, there are basic things that you should do from the get-go, e.g. not using globals. Second, the problem with "eventually I'll do it right" is that by that time, your stuff is out in the open, used by clients and heavily depended upon, and you have no way of refactoring. A company that uses a bad piece of technology will suffer many years before they could replace it.
Depends on the language, I suppose. I'm more productive with Python when I write everything procedurally and refactor into functions, classes, etc. every dozen lines or so. It's more fun than writing UML diagrams (and seems to produce better code, too!).
Or do you think so long that your head aches and your colleague Hephaestus splits you open to find a fully-formed cooperative multiple inheritance hierarchy?
The problem is, in a great portion of real world projects "eventually" never comes and there's just no time for any major refactoring or replacing technologies since you are too busy implementing the feature that was needed two weeks ago.
I've often dreamed of a specific type of software built and released as "prototypeware", where any app created using it will have certain built-in scaling limits—and going past them will irrevocably force the app into a read-only mode. It would warn anyone monitoring it well in advance of hitting such a limit, of course. But there'd be no way to just slide the limit upward or otherwise tarry. It'd force the migration to something better just as if it were a Big Customer with Enterprise Compliance Demands.
If an enforceable mechanism like that existed, I'd be a lot more confident in mocking things up. Stick SQLite in for the database, munge HTML and Javascript together, whatever—it's literally going to slap away the hand of anyone who tries to use it on a production workload, so why not?
(Going further, it'd be interesting to create some sort of quagmire of a software license, specifically for prototypeware, such that you'd be forced to rewrite all the prototype code instead of reusing even a hair of it in production. Maybe something like reassigning the IP to a trust, with the trust having an obligation to sue anyone and everyone who tries to create derivative works of the code they've been handed?)
This will not work. The whole "prototype" idea assumes once you grow out of the "prototype" phase you have the time, money, manpower, etc. to rewrite the whole thing based on solid, powerful technology and tools. That is, more of then than not, not the case.
The first problem is that every tool has demands, especially the limited ones, and you end up writing your application around those limits and demands, using platform-specific code that will have to be discarded and re-written come the migration.
The second problem is that these tools dictate design, and once you try migrating, you still have an application designed around the prototype tools, which make a lot of concessions and have design flaws because of that.
Finally, I've never understood the need for learning a specific tool, platform or language for "rapid prototyping". Use the tools you will use eventually, it's not that building something in, say, Java from scratch will take an order of magnitude more time and effort than building it on Node.js, despite all the hype, especially if you're a Java shop.
> it's not that building something in, say, Java from scratch will take an order of magnitude more time and effort than building it on Node.js, despite all the hype, especially if you're a Java shop.
I think we're picturing different things here. You're picturing having software engineers make the prototype, and then having the same engineers do the final implementation. Meanwhile, I'm picturing two different teams, with different competencies—one who knows a prototyping toolchain backward and forward and is extremely productive in it, and the other who knows a solidly-architected platform just as well.
The classical pipeline in the animation industry is to have two separate "teams" of artists. One team does concept illustration and storyboarding, and the other does keyframe animation and in-betweening. The first of the two teams is essentially a team of prototypers. Their output is a product which stands on its own for internal evaluation purposes—but which isn't commercially viable "in production." (Nobody really wants to watch 1FPS sketches.) So, after the storyboarding is complete, the whole product is redone by the actual animators into the more familiar product of 24FPS tweened vector-lines or CGI model-joint movements.
The more familiar case of this for web development is where the "prototype" is a PSD file. Professional capital-D Designers are usually Photoshop experts—they're very productive in it, and can mock up something that can be evaluated for being "what the customer wants" quickly, with rapid iteration if it's not right. Once they've got the customer's sign-off, their output product—their prototype—can be tossed over to development staff to "make it work." (There are also an increasing number of interaction-design prototyping apps targeting the same set of designers, under the theory that they'll be able to become productive in quickly iterating the "feeling" of an app with a customer in the way they're already doing with the "look" of the app. I haven't met a designer that uses one of these professionally, but I think that's mostly because there aren't any of these yet well-known enough to be taught in art schools.)
But when it comes to workflow and use-case design, we don't really see the equivalent pipeline. Looking through the lens of separated "prototyper" and "engineer" roles, there are clearly tons of software-development tools that were intended to be used purely by "prototypers": Rails' view scaffolding, for example. But since this role isn't separate, these things get used by engineers, and sneered at, since, as you said, it's no more effort—when you're already an engineer—to just engineer the thing right from the beginning.
Interestingly, all of the true examples of workflow prototyping I can think of come from the specific domain of game development—but even there, nobody seems to realize that prototyping is the goal of these tools, and tries to misuse them as "production" tools. RPG Maker, seen as a tool for making a commercial RPG, is total crap. RPG Maker, seen as a tool for prototyping an RPG, is an excellent tool. Its output is effectively a sketch, a cartoon in the classical sense:
> The concept [of a cartoon] originated in the Middle Ages and first described a preparatory drawing for a piece of art, such as a painting, fresco, tapestry, or stained glass window.
A cartoon is a prototype used to communicate intent. Yes, you (as the producer of the finished piece) can cartoon together with a client to iterate on a proposal. But much more interestingly, a client can learn to cartoon on their own—and then, in place of a long design document, they can submit their cartoon to you. An RPG Maker game project is the best possible thing I could hope to receive as a design proposal from a client asking for me to make an RPG. It forces all the same decisions to be made that making the actual commercial game does—and thus embeds the answers to those decisions in the product—but it doesn't require the same skillset to create that the commercial game does, so the client can do it themselves. The prototyping tool, here, is doing the "iterating on a design together" job of the designer for them.
We do have one common prototyping tool in the software world—Excel. A complex Excel spreadsheet is a cartoon of a business process, that nearly anyone can make. We as engineers might hate them, because people generally have no sense of project organization when making them—but every project to convert an Excel "app" will take far less time than one that involves collecting the business requirements yourself. The decisions have already been made, and codified, into the spreadsheet. You don't have to sit there forcing the client to make them. The process of cartooning has forced them to do it themselves.
---
To summarize: software prototyping tools aren't for engineers—if you have an engineer's mindset, you'll prototype at the speed of sound engineering practice, so prototype tools won't be any help to you; and you'll be more familiar with the production-quality tools anyway, so you'll be more productive in those than with the prototyping toolset.
But software prototyping tools definitely have uses: they can help designers to iterate on a "functional mock-up" to capture a client's intent; or they can even help clients to create those same mock-ups on their own. This is why "prototypeware" makes sense as software—but also why it should be self-limiting from being used in production. The prototype app wasn't created by someone with an engineering mindset—so there's no way it could end up well-engineered. Its purpose is to serve as a cartoon, a communication to an engineer; not to function in production on its own.
(Mind you, prototypeware could be made to function as an MVP in closed-alpha test scenarios, in the same way that the MVPs of many startups are actually backed by manual human action in their early stages. The point there is to test the correctness of the codified business process, rather than to support a production workload.)
There is nothing as long lasting as a temporary solution.
I've just fixed up some code marked "proof of concept" that had been in production for a decade...
Admittedly some people's PoC work is better than what some consider to be release ready, but still this was not intended to be in that state for that long.
I don't think that is an apt comparison. Replacing your database backend, at the minimum, usually requires a massive migration of data, and possibly even changes to your entire architecture.
A refactoring does not change behavior, and can be perfomed in minor -- and in your example of a global variable, perhaps even trivial -- increments.
Sure, there's a continuum of refactoring, from trivial to complete re-write.
Any time the data schema(s) change, you need to migrate. I'll bet that even when sticking with the same database flavor you'll need to migrate a handful of times over the first few months. Requirements change, blah, blah. After the first couple migrations, you refactor to make that less painful. Eventually it might get to the point that your persistence layer is fairly abstracted and you can change databases without ripping apart everything else. Doesn't happen with every project, but sometimes.
My concern would be whether Mongo will cause me to lose data.... "To recap: MongoDB is neither AP nor CP. The defaults can cause significant loss of acknowledged writes. The strongest consistency offered has bugs which cause false acknowledgements, and even if they're fixed, doesn't prevent false failures."https://aphyr.com/posts/284-call-me-maybe-mongodb
...or get my data corrupted: "When MongoDB is all you have, it’s a cache with no backing store behind it. It will become inconsistent. Not eventually consistent — just plain, flat-out inconsistent, for all time. At that point, you have no options. Not even a nuclear one. You have no way to regenerate the data in a consistent state."http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...
When you refactor or rewrite your code, you have the old code in version control, can write tests to confirm that it still works as expected, and there's no inherent time pressure.
If you pick an unreliable database and your data has been or is being lost and/or corrupted, it's more like a "try to stop the bleeding before the patient dies" situation.
That's not the time I want to be considering changing databases.
Likely better to scrape the data out through the app (if it's a web app) than to try to talk to that sort of database directly. The app would at least put names to everything.
I often liken NoSQL databases to dynamically typed languages.
With a NoSQL database, you have an implicit schema, but it will only be enforced and fail at runtime - when your code expects a field but failed to find it, for instance.
With a dynamically typed language, you have implicit types, but only enforced at runtime - when your code expects a value to be an int but finds a boolean, for instance.
And both are fine, there is a need for both. I can see how the flexibility of being able to change, well, everything by just flipping a switch in your head ("this is an int now") might be helpful for, say, data exploration problems.
It's just that in a production environment, these features of NoSQL databases and dynamically typed languages turn into massive sources of problems and oh god, just don't.
You and Lazare are right on the money. And the thing with the database is that the code that inserts/updates it has to agree with all the querying code about what the implicit schema should be - but it's implicit and scattered around your code - so on a large team it's very hard for everyone to understand that implicit contract and it's going to be a constant source of production bugs.
Schemas don't change that much compared to code, having a strict schema enforced by the database saves you so much time and pain and downtime in the long run.
This list makes me want to cry a little. It rings too true.
> Have migrations (except they're going to be some scary ad hoc nodejs script that loop through your document store and modify fields on the fly).
I literally just spent the better part of tonight AND yesterday evening dealing with one of these scripts. I had pulled down the production table to locally test the script (gross), but when I later ran it in the production environment, we'd somehow had an array sneak in to what was an object field. The whole thing just felt like a mess.
Because you can't just test it on one document and see if it works; you have no guarantee that all the documents will be identical. And if the migration script crashes halfway through... oh man.
> And if the migration script crashes halfway through... oh man.
Schema-issues and typing aside, I looked at MongoDB just long enough to find out there are no transactions, then ran away, quickly.
For a lot of tasks, I guess I would find MongoDB very useful, but lack of transactions is a complete deal breaker for me. Not having a real schema, referential integrity and all that makes them even more important, IMHO.
At work, I have had more than one quickly-hacked-together Perl script crash on me in the middle of a run. Having proper transactions has saved my butt repeatedly.
Mongo has its weaknesses, yes its main strength is cited as its simplicity, or that its quick to get something out the door.
I agree with your last comment. I can't help but to laugh at people who think they would get away with designing a database with no schema. Schemaless for me meant that unless you enforce constraints, there won't be any.
There's a reason why there are ORMs even though Mongo drivers are sufficient for most cases.
1) I've always designed my data with future changes in mind. I often spend up to an hour thinking of possibilities of data that I want to store in a collection, before writing the schema. The flexibility i have with Mongo is that if I think I need a field but am unsure of the exact data type to store, i.e. is it a string or array of strings, or array of objects with strings? In that case I just leave the field as an object and change it later. The plus being that as long as I haven't stored anything with that field, I can always change its type without a 'migration script'.
I've only needed to 'migrate' by updating documents 4-5 time. When GeoJSON landed, and a few other times when I needed small changes to my data.
3) The only way I can think of enforcing constraints on < 3.2 is through indices, which is insufficient. Most ORMs do the enforcing. I've never needed to enforce them at an app level.
I've used MongoDB primarily for its Geo support, and JSON enabling me to get things done quicker relative to maintaining SQL tables. I've got a small but interesting use case, public transit. https://movinggauteng.co.za and https://rwt.to.
When I started with the projects, PostgreSQL + PostGIS felt like a black box, and I wanted something that would give me ease and flexibility. At the time hstore was the talk of the day, but seemed to not meet my needs.
It would now with JSON, but I'll stick with Mongo for now.
Exactly. While the process of designing the structure of your data can make you feel like “you're not getting real work done”, in the long run, it actually prevents headaches caused by inconsistent data. Data always has a structure, it's just that some people are too lazy or mentally feeble to figure out what it is.
For me that is the most important aspect of starting / designing an application. If the data model is accurate, then the code falls into place easily. If its not quite right, more and more code ends up in the application trying to make up for the poor data model.
My first task in any project is to design the whole data model based on current requirements and while designing it I think of the interfaces and how would they read and write data (to refine requirements). Writing views and actions/APIs on top of well-formed data model then becomes a breeze.
Agile doesn't mean "don't gather requirements or plan anything." It just means that you evaluate your results frequently and maybe change course, instead of waiting until the end when you're "done".
To add to your point about schemas. The new generation has not learned that the data almost always outlives whatever throwaway front end was written to work with said data. Tying the data to some sort of flavor of the month framework is setting up for all sorts of pain later.
I despise mysql, but even it is better than mongo. At least with it I can easily transition the data to many different uses.
Also, and a point I find amusing is that many users of nosql claim schemaless and then go and write a layer on top of the datastore to enforce a schema. It would have been so much simpler to use a RDMS out the gate instead of badly implementing one.
> If you're EVER going to read the data back and do anything with it, it has a schema.
You are giving the NoSQL crowd too much credit. Some abominations have no recognizable schema at all. The data store will just contain arbitrary dump of data which different developers decided their "schema" should be. The number of "columns" will vary, the "columns" will have arbitrary formats, so on and so forth.
If one developer decided to separate name into "first: John", "last: Doe", you will have that. If another decided to have "name: John Doe". That's what will be there. If one developer decided social security should be "SSN: 123-45-6789" and another decided it should "SSN: 123456780", well you are going to have fun cleaning up the data at the business or even application layer.
But that's not even the big issue with MongoDB. It's their lack of ACID compliance!
> Because some people can't stand having to work with SQL,migrations,schema and constraints
The real question is “How come these people are allowed anywhere near data stores?” SQL isn't ideal, but how many of the alternatives are better at protecting the integrity of your data?
Thank you. Not everything is easy. This is the difference between engineering and 'hacking'. Hacking is not something to aspire to; it's something you do because of crushing, external pressures.
> At the end of the day, a SQL database doesn't represent the data in a way the programmer uses the data.
Errrr that's exactly what they do, unless you've got a terrible schema and havent thought about your data enough. The thing is about 'sql databases' is you can use the power of sql to fetch the data in any representation you want.
ORMs are a really bad attempt to force a square peg into a round hole. The mismatch between the relational model and object-oriented design principles is simply too big.
In the relational model:
(0) A relation is a collection of tuples of primitive values. Every relation has a relation schema, which determines the arity of its tuples and the type of each tuple component. In other words, the relational model is first-order.
(1) There are a few basic operators for computing relations from other relations (relational algebra).
(2) There is a mathematical theory (database normalization) of how to design primitive relation schemas to avoid storing duplicate information, and running into insertion, update and deletion anomalies.
On the other hand, in a pure object-oriented program:
(0) An object is a collection of data and operations on it. The data is hidden from the rest of the program, so the only way to operate on it is to use the object's operations. The operations may take objects as arguments and return objects as results, so objects are intrinsically higher-order.
(1) In general, there are no limits on how one can define a single object's operations. However, it's impossible to define operations which require knowledge of the internal representation of two or more objects at a time.
(2) There are heuristic guidelines (e.g., SOLID principles) for designing flexible object-oriented systems. However, they lack any sort of rigorous foundation beyond “it seems to work in practice”, so object-oriented designers may deviate from these guidelines at their own discretion.
---
For data-oriented applications, it's pretty clear to me that the relational model has important advantages over object-orientation:
(0) The decoupling between data and operations allows the database designer to focus exclusively on data integrity constraints, instead of anticipating whatever queries users will want to make.
(1) The limited expressiveness of relational algebra (with no recursively defined relations) is also a blessing, because it makes automated query optimization tractable in practice.
While objects present problem after problem:
(0) Object graphs are intrinsically directed, and must be traversed in the direction of its links. This makes queries less declarative.
(1) Objects have a notion of identity, which destroys many opportunities for using equational reasoning to build large queries. This also makes queries less declarative.
Of course, the relational model says nothing about general-purpose programming, whereas object-orientation does. But there exist other paradigms for general-purpose programming that are less badly in conflict with the relational model. For instance, functional and logic programming:
(0) Don't reject the use of first-order data, decoupled from operations.
(1) Prefer the notion of mathematical variable, whose meaning is given by substitution (a first-order operation), to imperative assignment, whose meaning is given by certain predicate transformers (intrinsically higher-order gadgets).
Is having seen something used some way a leading indicator of it being a good idea to have used that thing that way?
Because I've seen Excel used as database with all kinds of macros and VBA scripts bolted-on/embedded to provide the workbook various shapes of stored-procedure and query capability... but, while sorta impressive in a "Holy crap, lol wut?" kind of way, I'm not sure any instance I observed of uses like that were actually good ideas. Full of epic cleverness and ingenuity? Definitely. A good idea? Probably not.
Did they make the company a lot more money than they cost? If so they were probably a good idea. Not all code needs to be "pretty" to serve a purpose. I've seen some pretty epic hacks that I know generated hundreds of thousands of dollars of new revenue.
They often had a significant "bus factor" problem as a result of this in the best cases, and in the worst cases these mountains of hacks were a massive impediment to growth and/or evolution to meet changing marketplace demands... despite being a central pillar of data management and revenue as it existed in the status quo.
In my experience in market research, advanced spreadsheet programming with macros and pivot tables and whatnot are more a contemporary incarnation of Reporting than raw database querying and operations.
So my question is then: Why not use CouchDB instead? I don't see what Mongo gives you over that and CouchDB is at least dependable and predictable in its operation.
CouchDB is too reliable and actually fsyncs your documents to disk. That is plain boring. I like to live on the edge and have some documents go to /dev/null once in a while. Life is just more exciting that way ;-)
I really like Couch, I wish it had more adoption than it seems to have and that its ecosystem was more mature than it seems to be... and that javascript wasn't its first class citizen.
But its a really cool database (though I'm partial to rethinkdb now)
Even Javascript is sort of a second-class citizen in Couch. The real first-class citizen is native Erlang code running unsandboxed in the server context. If you want high(er) performance, that's where you go. (Alternatives to this have been discussed, like embedding the luerl Lua interpreter to give the option of a sandboxed programming target without the IPC cost. Nothing in the immediate pipeline, though.)
Me too. RethinkDB is my document database of choice these days. In my experience, its proven to be reliable and fast and the development team very responsive and helpful. They also seem quite mindful when it comes to new features and will delay things for years (eg auto-failover, which they now support but it took a while) if rushing it would impact quality.
That's what I want from a database: first and foremost it must be solid and not lose my data. Everything else (including high availability) can come after.
Is CouchDB still alive? I spent a weekend playing with it in January, but it seemed to be a very quiet project, with the last stable release being almost two years ago.
Most of the activity happens at Couchbase now, the company that the inventor D. Katz founded based on CouchDB technology. You can still use Couchbase for free, but it's possible to pay for support. The coolest thing they have is Couchbase Lite, the mobile version of CouchDB, lets you replicate with your server. I find it a very interesting alternative to Core Data, parse and co. and we use it in production.
> some people can't stand having to work with SQL,migrations,schema and constraints, it's as simple as that
Use the right tool for the job, right? Admittedly something like MongoDB could be the right tool for the job (examples around here include RethinkDB and CouchDB). MongoDB, however, is like a hammer with no head.
It has been pointed out before that json(b) has problems with indexing. IIRC the cost estimates of indexes on JSON data are static, and therefore very rarely accurate. I'm terribly sorry but couldn't find a reference with 5min of searching. I still like postgres over mongo
I guess the workaround would be creating indexes on computed columns that query from the json data, together with changing one's queries to use that computed field. For example, with a json column storing names in various places, a computed column could collect all of them in an array. An index on that computed column will have good statistics.
Bottom-line: if you want your queries to run fast, you will have to tell your store what kind of data you have and what kind of queries you will run. Otherwise, there's little the store can do.
Having a traditional database with various constraints is a way to give that information. With json columns, you may have to do it in another way (for now).
> Because some people can't stand having to work with SQL,migrations,schema and constraints, it's as simple as that
So use an ORM that understands Postgres' JSON columns. Don't need to write a single SQL statement, automagic migrations, no explicit schema (unless you make one), no constraints (unless you add them).
It works great, we did a rather large project last year using Django's ORM and postgres where we didn't know the final data schema until months after launch.
Never heard of ToroDB. Just checked out the website and it looks interesting, however the tagline "The first NoSQL and SQL database" is untrue.
At least OrientDB has had both schema+schema-free and SQL + NoSQL querying interfaces.
That is, you can optionally supply a schema for your documents. IIRC you could choose either schemaless, schema or mixed (where mixed allows fields not in the schema to exist as schemaless fields).
The default query language was SQL with "enhancements" (to allow for graph traversal), but you could also query with Gremlin. Not sure if this is still the case or not as I don't use OrientDB.
The above was true in 2012 and possibly a lot earlier. I see ToroDB's first Github commit was in 2014.
It does work for a certain volume of data. You can index fields you're interested in, even do so after the fact, and it's like any other database in that case. And sometimes you have small apps that do need complete historical log data, so Kafka et al just introduce unnecessary complexity since you'd need to aggregate into a key value store anyways.
But if you do this, god forbid you go beyond where indices can fit in RAM of a single machine. And you will do so, with probability one given your product doesn't shut down. So you're running a gauntlet against a redesign.
It’s useful for prototyping. When you don’t know which schema you’ll end up using having an *SQL database is tedious because you have to do migrations every time you change the schema. Once you’re done prototyping you can switch to a better alternative.
If you need to preserve data between the application versions then you still get all the headaches with MongoDb (either migrating the data or supporting multiple schema versions when you read the data, oh the fun!).
If you don't need to preserve data between the versions then you don't need to write migrations scripts in SQL, just scrape everything and pretend it's the first version of application.
I guess this is a question of how usefuly, deployable the prototype shoukd be. Why not just have an in memory object cache, literally a hashmap, for your dal? If you're composing app level code, you don't need to know what the backend does to your data. You could even create a simple method to populate the data at app boot in your in the dev profile. When you figure out the storage requirements and finalized model, build your db.
This would save you time on picking the db, schema changes or even migration changes in Mongo. You don't have to worry about bad documents from an earlier app revision.
In that case you can start with postgres and stuff your documents in a single json column, accessing it just as you would have in mongo while you're prototyping and don't care about speed and indexing, and when you're done, you can just change that table to a more proper structure without changing databases.
Exactly. Use the right tool for the right job. Start prototyping and development with MongoDB and then migrate to Postgres, or Cassandra or whatever suits your user-case better.
This is key and often overlooked - MongoDB is so popular not because it's the best database but because it's so easy to get started with. Download/unzip/run to have a database engine ready. It also helps that you can also immediately store anything without any prior setup steps.
Postgres/mysql/sqlserver/etc are nowhere near as easy to install, as fast to get started with or as portable to move around.
Postgres members should listen this and have a simple getting started guide for osx, Windows, Linux. I tried brew install postgresql. There was no single place which tells me how to start server, access command line, create db etc.
On OSX there is the fantastic http://postgresapp.com/ . It installs into /Applications so it is easy to remove, and comes with a start/stop GUI and taskbar icon. Great for local development.
But installing and configuring Postgres "properly" on a server is still something of a challenge. Do I need to modify random_page_cost on a SSD or not? What are good memory limits on modern big servers? What exactly needs to go into pg_hba.conf?
None of these seem too difficult after reading a few tutorials and wikis, but it would be nice if the server set itself up with reasonable defaults based on the machine its running on.
Getting started with PostgreSQL on Linux is actually trivial. What is annoying though that there are lots of guides which talk about editing pg_hba.conf which is not necessary for the simplest setup. The default pg_hba.conf is good in most distros.
We must have different definitions of trivial compared to what I had to go through every time - it's a mess of an install process that takes tweaking config files just to have it even listen to external requests.
With the ease of services like AWS, we never installed a database server. Pick a database flavor, version, click, click and you're up. I suppose designing the schema take a little effort, but I find it much easier than properly architecting software.
Many if not most installations are still being done on actual dev machines and servers. While RDS and other managed services are nice, they're just a small fraction of the usage.
Also the fact that managed services help so much only speaks to the fact of how difficult these relational databases typically are to work with operationally.
if you're on a mac you can download postgresql.app[0] which produces a small icon in the top right status bar. You don't have to install users or permissions or anything it's super easy to set up. Getting it on prod can come later but for the first five minutes it works.
(granted, this neglects contrib extensions like hstore)
It's not just installing for development, mongodb (other than its ridiculous clustering) is very easy to install on production servers as well and moving an installation is basically just zipping up the folder and moving it somewhere else.
That sets up postgres. It doesn't let you get started doing CRUD operations inside your postgres though - which is where MongoDB shines, "just store my data, fuck it"
> Mongo is a dumb, dead-end platform, but they know how important ease-of-use is.
By “ease of use”, do you mean “ease of making something that seems to work” or “ease of making something that actually works”? I've never used a schema-free database, and ended up thinking to myself “I'm completely sure this database can't possibly contain garbage data”. Or do programmers simply not care about data integrity anymore?
The single biggest source of grief in our production database has been the one JSON field we used once to avoid adding another table. That goddamn thing has crashed the server so many times with invalid data, that I'm never using anything schemaless again. We recently migrated to a proper table and I'm thanking my lucky stars I finally got rid of that devil.
You can insult the developers that use Mongo or you can look at how to get those users onto a better platform. With the modern expectations of full-stack development, is it any wonder that something promising simplicity and zero-configuration data storage does well?
Well, I have used one and had that assertion. Ease of use means making something that works. It seems we're condoning lack of knowledge or experience with programming. If you've never used databases in your life, whether you're using SQL orNoSQL you'll likely end up with rubbish data. In the SQL world it could be that you're storing time in the wrong format, concatenating long fields, or not normalising when you should or the other way around.
In NoSQL you could be reinventing the wheel, or storing data that you can't query efficiently because you can't index it well etc.
All the excuses of not using some document stores beyond ACID really sound like people won't know what the heck they're doing.
> Well, I have used one and had that assertion. Ease of use means making something that works.
For me, it means, under no circumstance, no interleaving of transactions or scheduling of commands, nothing, nichts, nada, can the database be in a state where a business rule is violated. If I need to worry what silly intermediate transaction state can be observed from another transaction, or if I need to worry whether a master record can be deleted without cascade-deleting everything that references it, then the DBMS has failed me.
> not normalising when you should or the other way around.
I've never seen a situation where anything less than 3NF (actually, ideally, at least EKNF) is acceptable.
What they neglect to mention is not that you don't need a DBA, that you are the DBA yourself. With all the responsibilities that go along with that role. Who's getting paged at 3am now...?
If thats really a problem host your database on a cloud platform and let those guys do the job of a DBA. Works for me at present. I am aware its not going to be a solution for everyone, though it still sounds a lot better than being your own Mongo DBA.
The DBA is the role who is reponsible for the organizations data. Even if you outsource the routine tasks such as "doing backups" you still need someone to assume that role.
Yes, it will help you to cover cases like where the server phyically explodes, but that's basically irrelevant, most problems where you need a DBA are caused either by data corruption caused by application code or developer, or performance issues caused by DB structure - in those cases the cloud platform won't do anything for you, they just host the server. They can restore backups, do monitoring and tune the server, not your particular app/db structure - but all the big problems are there.
"most problems where you need a DBA are caused either by data corruption caused by application code or developer, or performance issues caused by DB structure"
Is running Mongo going to solve any of those problems?
Without a rigidly enforced schema I would guess those problems are going to be amplified rather than solved.
This will be highly subjective but you need to get over the "postgresql has the longest feature list so why don't you use it". The last startup I have been involved with tried to use PostgreSQL and needed to move to MySQL (yeah, well) because commercial support was both more expensive and less useful than what we were able to get for MySQL. Perhaps today it's different.
While I no longer use PostgreSQL much, every time I need to touch it seems rather developer unfriendly, just last month I found MySQL, heck even SQLite supports triggers with code inlined into the trigger body but PostgreSQL mandates writing a separate function for the trigger. And, of course, it needs to be in plpgsql because reasons. The most trivial "let's calculate another column" becomes a complicated nightmare.
So then if you don't want to use PostgreSQL what then? The answer now is MySQL, again, because 5.7 has JSON.
And mind you, I have grown to dislike MongoDB slowly over the years as new types of queries have appeared and it's a complete mess by now. There was an excellent article on this posted on Linkedin of all places this March https://www.linkedin.com/pulse/mongodb-frankenstein-monster-...
It's really interesting how MySQL is the most usable and most supported database by now...
> This will be highly subjective but you need to get over the "postgresql has the longest feature list so why don't you use it".
I think if you re-read it, you might see that at no point did the post that you're replying to imply that Postgres was preferred because it had a longer list of features. They're speaking entirely about the strong guarantees that an ACID system gets you.
Document stores are only mentioned because this is one of the (incorrectly) perceived advantages that Mongo has over Postgres and other databases.
"just last month I found MySQL, heck even SQLite supports triggers with code inlined into the trigger body but PostgreSQL mandates writing a separate function for the trigger"
The right response, as a postgres developer, is to agree that you describe a useful feature, and perhaps implement it to help other users.
But my advice to you is to be willing to put up with some short-term annoyances. Sometimes the best choices are a little annoying, and if you refuse to consider them, it will cost you (or your employer) much more later.
It is listed on the TODO https://wiki.postgresql.org/wiki/Todo page, apparently since 2012. I haven't coded in C since 1998, I do not think you want me to touch the PostgreSQL code base.
> just last month I found MySQL, heck even SQLite supports triggers with code inlined into the trigger body but PostgreSQL mandates writing a separate function for the trigger
I found something similar (and in the last month too) – insofar as we're talking missing popular features – but with MySQL's and Postgres's positions reversed.
`ALTER TABLE ... ADD CONSTRAINT CHECK ...` runs on MySQL without an issue, and so does any INSERT or UPDATE violating that CHECK constraint. A bug was filed in 2004.
My response to the comment for triggers and PGSQL would be: Postgres tries its best to stop you shooting yourself in the foot.
Similar to the top comment, all the real problems I've ever encountered with postgres (heck, all major RDBMS's for that matter) come from certain areas, mainly triggers.
A startup which I have helped architect back end uses mongodb for everything. before starting the project I have requested the CTO not to use mongo as it was not a right fit.Basically they needed more of a relational stuff.The CTO chose mongo because he was thinking every startup uses it and why not us. Now they are suffering as they need ACID and relational features. They want to rewrite to postgres but they are heavily invested and not easy to go back.
Postgres wasn't always a great document store...there was definitely a time period where if you wanted to take a document-oriented approach to data modeling, MongoDB was a good way to go. JSONB was only added in the last minor version of Postgres, and while the JSON and HSTORE types were available, it didn't give you quite the same speed. Now that JSONB is a thing, I think the two databases are more comparable as a document store.
Is that actually true? I mean the part about Postgres might be, I don't know. But was there a time when MongoDB was a good way to go?
Was there ever a time when it actually worked consistently well at something that was database shaped? Because I started dealing with it in ~2010 I think, and it wasn't a suitable database for anything other than toy projects or throw-away data back then, and while it's many versions newer, it still appears to be pretty fast and loose with its supposed system guarantees.
There was a point when they raised $100+ Million in funding that I thought they'd take that money and actually build a database. At least as recently as last Summer that wasn't a reality yet.
A text/blob field with a normalized key column or two were always vastly superior. We're talking about data loss at an incredible level. I mean, a new Jepsen test comes out and this community goes bonkers over how database X might suffer a split-brain problem for a few milliseconds under an extreme condition but Mongo on a single instance has never been safe, and people are making excuses for it.
> I’ve hesitated to recommend RethinkDB in the past because prior to 2.1, an operator had to intervene to handle network or node failures. However, 2.1’s automatic failover converges reasonably quickly, and its claimed safety invariants appear to hold under partitions. I’m comfortable recommending it for users seeking a schema-less document store where inter-document consistency isn’t required. Users might also consider MongoDB, which recently introduced options for stronger read consistency similar to Rethink–the two also offer similar availability properties and data models.
I've been using Postgres/jsonb for JSON document store. It works OK - the query capabilities are still a little rough (9.5 is better than 9.4), and some frameworks like Loopback don't support JSON in Postgres yet (not sure which ones do), but it's definitely capable and reliable...
What do you think about noSQL in general. From what I could follow from aphyr rethinkdb seems pretty awesome. I like it a lot, but I am also not getting a ton of traffic on localhost:3000...
That is an unanswerable question since it's about everything. "NoSQL" is a huge variety of techniques - many of them yet to be invented, that only have one thing in common: "not SQL". From document storage over key/value storage to graph databases. Anyone who tells you what they think "about NoSQL" either has to redirect the question to become a useful one, or if they actually attempt to answer it take your popcorn and expect entertainment at best.
fair enough. I mentioned rethinkdb above because I find it very intuitive and versatile. I've used mongodb a fair bit but I like rethinkdb better for a host of reasons. I guess what I meant was, I thought mongodb was ok however everyone here seems to have always known it was deeply flawed. I tried to follow the jepsen report on rethink but I don't fully comprehend the tradeoffs/benchmarks ect, and was curious what others thought about it.
You're dealing with a torrent of incoming semi-unstructured data, where losing a good chunk of it is minor nuisance because you only need a decent sample, from which you extract data.
In those kind of scenarios, making it easy to work on the code can often be far more important than reliability.
I have a project like that now. I'd love to use Postgres, and probably will eventually once things "settle down" and we know what data we need to store. But or now MongoDB is the "quick and dirty" solution. We define a schema client side for everything we nail down, so as we nail down more aspects of what data to process, it gets easier to transition to a proper database.
As ORMs get better support for Postgres' JSON capabilities, it will likely get less and less appealing to use MongoDB for stuff like this too.
It HAS them, just not built-in tooling to make using them easy. 2ndQuadrant's repmgr gets you partway there, I'm really hoping to see them revamp it now that pg_rewind is a thing to make restoring a failed master less of a pain in the butt (this is literally the only reason I don't bother with HA right now, it's usually much easier for me to get the DB back online or restore from a barman backup than deal with replication).
If you want that you can always use a variant of postgres that does like greenplum, citus, and a few others. They're battle proven. There's also MySQL and its variants as well.
Not to mention that are NoSQL alternatives that have a better track record than Mongo like Cassandra.
Cassandra, HBase etc had checkered pasts with plenty of their own data loss and inconsistency bugs.
Now they are considered two of the most rock solid NoSQL databases. The hatred towards MongoDB really is pretty irrational given just how popular the database is.
FWIW, I don't think Cassandra is particularly any better today semantically than it used to be. Merge conflicts are still at the cell level than the row level, and wall-clock time is still the way that LWW resolution is determined. It let's you mix strongly consistent and eventually consistent data together, which makes no sense.
But the difference is that Cassandra is reliably "broken" in those ways, and as a result there are ways of using it which don't lean heavily on those weaknesses. Such as writing only immutable data or isolating all data that will be used in paxos transactions into their own column families by convention, etc.
Cassandra more or less behaves exactly as it claims that it does. So you can do a somewhat thorough investigation of its system semantics and know what you can rely on and what you can't. MongoDB doesn't even uphold the system semantics it claims that it has, so it's just broken in weird and esoteric ways that you discover mostly by accident.
Scale to what though? RDBMs can easily handle large loads, have replication, etc... At the point where you need true scaling, you'll have a much better idea of your problem and can solve it appropriately.
Why on earth haven't I come across this information before? I spent a crazy amount of time researching frameworks before settling for Meteor, and never came across this.
I worked at a Data Analytics start up in Palo Alto back in 2011 and we had 8 or 9 databases in our arsenal for storing different types of data. MongoDB was by far the worst and most unstable database we had. It was so bad that for the presidential debate, I had to stay up and flip servers all night because even though the shards were perfectly distributed, the database would crash and fail over to two other machines which couldn't handle our entire social media stream. We ended up calling some guys from MongoDB in to help us troubleshoot the issue and the guy basically said "Yeah we know that's a limitation; you should probably buy more machines to distribute the load." I like the concept of Mongo, but there are other more robust NoSQL databases to choose from.
I had a similar experience in 2011 with Mango where we were running map reduce jobs which Mongo advertised to support. The whole system got blocked from running the map reduce and the Tengen consultant sighed when we told him we were running map reduce jobs.
Which isn't the best argument to make against MongoDB since you should have known - it's even part of their course curriculum - that map/reduce is not the optimal way to aggregate in MongoDB. They have their own aggregation framework (https://docs.mongodb.com/manual/core/aggregation-pipeline/).
I have no intention of defending MongoDB because what do I know, never worked with it in real life - but just out of curiosity I took the free courses they offer (https://university.mongodb.com/) and I find that a sizable share of the complaints about MongoDB come from people who don't seem to have learned much about the product they are using. It's like people complaining their new truck behaves badly in water.
A lot of critics seem to have chosen MongoDB when they needed a SQL DB from day one. If you need full flexibility to (re)combine data you need SQL, for example. A document store isn't "schema-less" at all - much of the schema is built-in and very inflexible after that.
I actually wonder what the correlation is between PHP use and MongoDB use. They both have an attitude that mistakes ease with simplicity, a philosophy that puts correctness way down the priority list, and an easy introduction with a heavy ongoing maintenance tax.
Yes it is easy to use. But too bad it also fails "transactions" silently so that you don't even know if your changes were "committed" or not. Don't worry, it only happens every once it a while so it's not a big deal...
Unless you are coinbase or an organization that deals with money/bitcoins/etc and you need ACID compliant transactions so that "debits/credits" don't just magically disappear.
When the bitcoin craze was going crazy, coinbase had all kinds of problems due to their mongodb backend.
It's pretty easy to use... until you have to normalize data and query across one or two joins. I've been forced to build with mongo for the past few months (still not sure why) and I can't think of a single valid use-case for this rubbish.
If you need denormalized/distributed caching, Redis does a good job.
If you need to store some unstructured json blobs, postgres and now sql server 2016 can do that.
If you need reliable syncing for offline capable apps, you probably want CouchDB.
If you need real time, use Rethink
Obviously, relational data belongs in a relational database.
I think the problem is that all of these databases do one or two things really well. Mongo tries to do all of these things, and does so very poorly.
I (used to) hear lots of good stuff, but the type of devs were always hype driven. Asking for a reason why Mongo was used, the reply sounded just like the marketing hype on Mongo's homepage - lots of buzzwords and catchphrases ("big data", "schemaless") with no substance to the reason for choosing it.
The fundamental problem is that MongoDB provides almost no stable semantics to build something deterministic and reliable on top of it.
That said. It is really, really easy to use.