Last week we released gh-ost , GitHub's online schema migration tool for MySQL . As with other open source releases in the MySQL ecosystem, this release was echoed by several "Why not PostgreSQL?" comments. Having been active in open source since many years now, I'm familiar with these responses, and I find this is a good time to share my thoughts. Why? XKCD knows the answer:
I picked one post I wish to address (latest commit: 3dfbd2cd3f5468f035ec86442d2c670a510118d8 ). The author invested some time writing it. It nicely summarizes claims I've heard over the years, as well as some prejudice. Through responding to this post I will be generalizing thoughts and impressions to address the common reactions. Dear @brandur, let's grab a beer some day; I fundamentally disagree with your post and with its claims.
This is not an anti-PostgreSQL post
Disclosure: I appreciate PostgreSQL. I always wanted to be a proficient PostgreSQL user/DBA. I think this project is one of the finest examples of quality open source. It sets some high standards for open source in general, and for RDBMS in particular. I am not emotionally attached to MySQL to the extent that I would hate everything that is not called "MySQL". I never understood this approach. I am not interested in religious wars. I'm an engineer and this post follows engineering guidelines.
gh-ostdelivers powerful online schema migrations to MySQL, differentiating itself from existing tools by being triggerless, auditable, controllable, testable, imposing low workload on the migrated master. It addresses the same problem addressed by existing tools as of 2009.
The most basic premise of this post is: MySQL does not have feature X, PostgreSQL does, therefore PostgreSQL.
We'll discuss the truth of the above shortly, but let's first discuss the essence of this guideline.
It should be generally agreed that a statement of the form " A doesn't have feature X therefore B " is incomplete. We understand complex systems have varying feature sets.
MySQL has some features PostgreSQL doesn't. Take, as example, the feature R : MySQL has got it since ages ago, and yet PostgreSQL is slow to adapt it, and relied on 3rd party solutions for many years. MySQL's implementation of R is far more elaborate than PostgreSQL's.
But if we follow the rule suggested above, we must now migrate from PostgreSQL to MySQL, because PostgreSQL does not have feature R (or one of its variants). Infinite loop!
In practice, we evaluate the pros and cons, the features the products A and B have or do not have. Which feature is more important to us? X or R ? Is one of them fundamentally required for our operation? Can we work around it if we don't get it directly from the product? That, and experimentation, is the way an engineer should approach a choice of technology.
In the world of RDBMS we are interested, among others and in no particular order, in write latency and throughput, read scale out, durability, loss of data in the event of failure, failure promotion schemes, DR, consistency, SQL features, etc. by this list alone it is impossible to claim "PostgreSQL is better than MySQL" nor "MySQL is better than PostgreSQL" .
The particular claim and advice
The author suggests we should be using PostgreSQL because it inherently solves the problem for which we embarked on developing gh-ost . That is, that PostgreSQL supports true online schema changes. That statement is misleading and I resent the way that statement is delivered.
The post does not mention that PostgreSQL supports online schema changes for a limited set of operations. I went ahead to double check with the PostgreSQL documentation. I love the documentation! It is detailed and honest. I'm thoroughly satisfied that PostgreSQL only supports a limited set of online operations. I go against the fact the post does not mention that, and leads us to "understand" PostgreSQL has a magic wand.
Online table operations are supported in PostgreSQL for adding/removing indexes, for removing columns, and for adding columns under certain conditions. As an example, adding a nullable column is an online operation, whereas adding a column with default value is a locking operation.
A very big part of our schema migration including adding/removing indexes and adding columns. Many of these operations fall under the lockless, online operations PostgreSQL would allow. However a large part of our migrations also consists of adding columns with default values, changing data types (e.g. From INT to BIGINT ), changing character characteristics (e.g. length of text column), space reclamation, and others. These changes are blocking in PostgreSQL.
The premise of the post now turns to: it's a pity you invested time and money in developing a tool that solves 100% of your problems when you could have switched to PostgreSQL which would solve 40% of your problems!
If I were to insist my fellow engineers at GitHub migrate to PostgreSQL in order to solve the migration problem, and then, once this technical transition is complete let them know 60% of the migrations are not at all addressed and that we are still stuck with the very same problem we started with, I would not be a very popular engineer.
"the same advancements never happened in MySQL" is a false statement.
As mentioned in the gh-ost announcement, MySQL as of 5.6 does indeed support online, non blocking alter table. In fact, it supports many more variants of online alter table than PostgreSQL does (however, noticeable difference is that PostgreSQL makes those changes transactional whereas MySQL does not).
Also as mentioned, one of the shortcomings of the MySQL implementation is that it is aggressive, and may cause a high impact on the running master. In my understanding the PostgreSQL implementation is no different. There's nothing to cause throttling, to play nice with the running workload. Yes, in PostgreSQL you can Ctrl-C your ALTER , but who wants to Ctrl-C a 10 hour operation?
gh-ostaddresses that as well. Its throttling gives super powers over the migration process, kicking in throttling based on master load, replication lag, various human controlled criteria, effectively making it very lightweight on the master.
"there's a level of seemingly willful misdirection here that I just can't wrap my head around"
XKCD to the rescue again:
I dare say this is not the kind of thing a person would say in person, and the accusation is rather severe. It is also ironic. Dear author, consider:
- PostgreSQL does not really solve 100% of the problem gh-ost does, and yet you claim we'd be better off with PostgreSQL.
- MySQL does indeed provide more variants of online alter table than PostgreSQL does, and yet you claim it has no online alter capabilities.
- I might claim there's a seemingly willful misdirection in your post. I might claim nowhere in your write up do you mention the deficiencies in PostgreSQL.
Instead, I'd rather like to think that you, and others, are misinformed, basing your opinion on rumors and prejudice instead of facts.
I also observe that people all around the world like to willfully differentiate themselves from others. Even in tech. this is the topic for another post, but consider explaining to a complete outsider, say your doctor, why people who work in tech, are engineers, work with data, work with databases, work with relational databases, work with open source relational databases, people who have so much shared experience, still insist on "us and them", and seek to see the negative in the other. Sheesh.
Paraphrasing a memorable sarcastic quote from the movie Erin Brockovich : the fact so many of the largest tech companies today choose to use MySQL as their backend database does not mean it's crap.
No. We really think MySQL does a good job. It is not perfect. We work around some of its limitations.
The claim "you'd be better off with PostgreSQL" (not a quote from aforementioned post) cannot be made without understanding the specific workload of a company/product. It would be presumptuous of me to approach a PostgreSQL based company and say "oh hey why use PostgreSQL? You'd be better off with MySQL!"
It makes perfect sense to say "PostgreSQL handles problem X better than MySQL" or even "if your one most important requirement is X, you should switch to PostgreSQL". Otherwise claiming one database is wholly better than the other cannot be taken seriously.
Deficiencies? Any project of scale has deficiencies. It is granted. We observe and measure, and take features and deficiencies into calculation, and that makes for good engineering.
- If you're using PostgreSQL and it works well for you, you're doing the right thing.
- It you're using MySQL and it works well for you, you're doing the right thing.
- If you found that PostgreSQL works better for you where MySQL does not, and you decided to switch over, you're doing the right thing.
- If you found that MySQL works better for you where PostgreSQL wasn't, and you decided to switch over, you're doing the right thing.
- If you found that PostgreSQL works better for you where MySQL wasn't, but decided to stick with MySQL because migrating would be too costly, you're doing the right thing.
- If you found that MySQL works better for you where PostgreSQL wasn't, but decided to stick with PostgreSQL because migrating would be too costly, you're doing the right thing.
- If you pick one over the other because of licensing constraints, you're doing the right thing.
- If you choose to switch over because of rumors, prejudice, FUD, politics, religion, you're doing it wrong.
Final personal note, on pride
"Yesterday, GitHub broadcasted an indomitable sense of self-satisfaction far and wide..."
Oh hey, XKCD again . But I would like to ask an honest question: if some pg-gh-ost were to be released, a tool that would solve 100% of your PostgreSQL migrations requirements, doing it better than PostgreSQL does, covering all cases, throttling as your daily sqoop imports kick in, as your rush hour traffic kicks in, giving you far and wide greater control over the migration process, what would you do?
Would you write an offensive post filled with accusations, ranting about the deficiencies of PostgreSQL and how people even consider using such a database that needs a third party tool to do a better job at migrations? Would you tweet something like "Or... Use MySQL!"
Or would you embrace a project that enriches the PostgreSQL ecosystem, makes it even a greater database to work with, understanding PostgreSQL is not yet perfect and that more work need to be done?
I take pride in my craft and love making an impact; if we ever do meet for beer I'm happy to share more thoughts.
Peace on earth