slides are up for my talk, "Database Scalability Patterns", which I gave this week at OSCon 2010. You can get them from the OSCon page, from slideshare, or just watch it below
PlanetMySQL Voting: Vote UP / Vote DOWN
Welcome to Log Buffer, the weekly roundup of DBA industry happenings.
Read on for the latest updates in Log Buffer #193. Don’t forget, we’re always looking for volunteer editors to publish and host an issue of Log Buffer. If you’d like this to be you, contact the Log Buffer coordinator.
ODTUG/Kaleidoscope 2010 roundup:
Sheeri Cabral has posted slides and a summary of the first ever MySQL track at ODTUG/Kaleidoscope, citing a successful event.
Tim Hall, on Oracle-Base blog gives a daily report on the event summarizing his take on day 1 & 2, 3, 4, and 5.
Following the ODTUG/Kaleidoscope Oracle ACE Director’s briefing Alex Gorbachev continues the conversation on best practices with a blog post and encourages you to add your comments.
Marc Fielding, of Pythian’s consulting group suggests the following reads this week:
Pinal Dave who tries to clear up confustion and misconceptions about Statistics on SQL Blog Authority.
SQL Master noting a SQL Server 2008 Security gotcha, from a discussion he has on sql server security during interactions with the SASSUG user group.
On SQL Aloha, a post from Brad McGhee, who poses the question: Does your company have a SQL server DR plan? A quick poll he conducted provides scary results and he offers a list of resources for DBAs without a complete and tested DR plan in place.
Sramana Mitra, who writes about Oracle’s next move: data security, now that the Sun acquisition is complete.
OracleNerd outlining strategies for how to receive free passes to Oracle OpenWorld 2010.
Peter Finnigan talking about database forensics on his Oracle security blog, and the release of a new redo log mining tool to extract DDL from redo logs.
In other news, Willie Favero gives readers a heads up that the DB2 Catalog and directory are about to go through some major changes.
Guiseppe Maxia, The Data Charmer, learns a new programming language as he tries to integrate MySQL Sandbox and Cluster.
Dave Page on his Postgres blog, highlights what SQL injection attacks are and how DBAs can avoid them after another malicious attack targeting websites running Microsoft Internet Information Server (IIS).
Happy Canada Day for those north of the border, and for our friends to the south, Happy Independence Day and enjoy a great 4th of July weekend.
OpenSQLCamp is less than 4 months away, and I have finally gotten around to updating the site. Special thanks go to Bradley Kuzsmaul and the folks at Tokutek for getting the ball rolling and making the reservation at MIT. Using MIT means that we will have *free* reliable wireless guest access and projects.
OpenSQL Camp is a free unconference for people interested in open source databases (MySQL, SQLite, Postgres, Drizzle), including non-relational databases, database alternatives like NoSQL stores, and database tools such as Gearman. We are not focusing on any one project, and hope to see representatives from a variety of open source database projects attend. As usual I am one of the main organizers of Open SQL Camp (in previous years, Baron Schwartz, Selena Deckelmann and Eric Day have been main organizers too; this year Bradley Kuzsmaul is the other main organizer). The target audience are users and developers, but others are encouraged to attend too. There will be both presentations and hackathons, with plenty of opportunities to learn, contribute, and collaborate!
I have updated the main Boston 2010 page at http://opensqlcamp.org/Events/Boston2010/ with travel and logistics information, including links to:
Register — it’s free and easy, and you can always change your mind later!
Maybe you have an idea for a session you would like to see, or a session you would like to give? If so, you can note it on the sessions page. This will give everyone a sense of what type of presentations will be there. I have started by putting 2 sessions I am willing to give and a third at the bottom for one I’d like to see, to give everyone an idea of both types of descriptions.
Probably the most important link right now is the way we keep OpenSQLCamp free for all attendees – sponsor or donate to the conference! Any donation amount is accepted, and all donations are tax-exempt to the fullest extent of the law. Businesses and organizations will be listed as sponsors if they make a donation of $250 or more, and individuals will be listed as sponsors if they make a donation of $100 or more. More information on sponsor benefits, including where to send a graphic to, at the link.
There is a preliminary schedule, up until the conference itself it will only show the agenda of the conference — how many rooms and what time the presentations are supposed to be. During and after the conference we will update this schedule page with the titles, presenters and links to any notes/videos/audio taken.
If you have any questions, please do not hesitate to ask on the mailing list or by posting a comment here.
watch -n 5 'psql -U robert -h mk -d wario -c "select procpid, waiting, query_start, current_query from pg_stat_activity where current_query ~ '\''^COPY'\'' ORDER BY procpid"'
Every 5.0s: psql -U robert -h mk -d wario -c "select procpid, waiting, query_start, curren... Mon Apr 05 15:22:03 2010
procpid | waiting | query_start | current_query
---------+---------+-------------------------------+-------------------------------------------------------------------------------------------
12706 | f | 2010-04-05 15:21:34.565754-05 | COPY wario.tblhits_p2005_1010 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout;
12728 | f | 2010-04-05 15:21:39.563847-05 | COPY wario.tblhits_p2005_1017 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout;
12732 | f | 2010-04-05 15:21:43.43266-05 | COPY wario.tblhits_p2005_1024 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout;
12735 | f | 2010-04-05 15:21:48.385349-05 | COPY wario.tblhits_p2005_1031 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout;
(4 rows)
Every 5.0s: psql -U robert -h mk -d wario -c "select procpid, waiting, query_start, curren... Mon Apr 05 15:22:19 2010
procpid | waiting | query_start | current_query
---------+---------+-------------------------------+-------------------------------------------------------------------------------------------
12728 | f | 2010-04-05 15:21:39.563847-05 | COPY wario.tblhits_p2005_1017 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout;
12732 | f | 2010-04-05 15:21:43.43266-05 | COPY wario.tblhits_p2005_1024 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout;
12735 | f | 2010-04-05 15:21:48.385349-05 | COPY wario.tblhits_p2005_1031 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout;
12836 | f | 2010-04-05 15:22:10.746129-05 | COPY wario.tblhits_p2005_1107 (partner, bowser, coins, hitdate, tld, affiliate) TO stdout;
(4 rows)
I've just publicly released another Postgres-related script, this one called "boxinfo". Basically, it gathers information about a box (server), hence the catchy and original name. It outputs the information it finds into an HTML page, or into a MediaWiki formatted page.
The goal of boxinfo is to have a simple, single script that quickly gathers important information about a server into a web page, so that you can get a quick overview of what is installed on the server and how things are configured. It's also useful as a reference page when you are trying to remember which server was it that had Bucardo version 4.5.0 installed and was running pgbouncer.
As we use MediaWiki internally here at End Point (running with a Postgres backend, naturally), the original (and default) format is HTML with some MediaWiki specific items inside of it.
Because it is meant to run on a wide a range of boxes as possible, it's written in Perl. While we've run into a few boxes over the years that did not have Perl installed, the number that had any other language you choose (except perhaps sh) is much greater. It requires no other Perl modules, and simply makes a lot of system calls.
Various information about the box is gathered. System wide things such as mount points, disk space, schedulers, packaging systems are gathered first, along with versions of many common Unix utilities. We also gather information on some programs where more than just the version number is important, such as puppet, heartbeat, and lifekeeper. Of course, we also go into a great amount of detail about all the installed Postgres clusters on the box as well.
The program tries it's best to locate every active Postgres cluster on the box, and then gathers information about it, such as where pg_xlog is linked to, any contrib modules installed, any interesting configuration variables from postgresql.conf, the size of each database, and lots of detailed information about any Slony or Bucardo configurations it finds.
The main page for it is on the Bucardo wiki at http://bucardo.org/wiki/Boxinfo. That page details the various command line options and should be considered the canonical documentation for the script. The latest version of boxinfo can be downloaded from that page as well. For any enhancement requests or problems to report, please visit the bug tracker at http://bucardo.org/bugzilla/.
What exactly does the output look like? We've got an example on the wiki showing the sample output from a run against my laptop. Some of the items were removed, but it should give you an idea of what the script can do, particularly with regards to the Postgres information: http://bucardo.org/wiki/Boxinfo/Example
The script is still a little rough, so we welcome any patches, bug reports, requests, or comments. The development version can be obtained by running:git clone git://bucardo.org/boxinfo.git
It's been interesting watching the MySQL drama unfold, but I have to take issue when people start trying to drag Postgres into it again by spreading FUD (Fear, Uncertainty, and Doubt). Rather than simply rebut the FUD, I thought this was a good opportunity to examine the strength of the Postgres project.
Monty recently espoused the following in a blog comment:"...This case is about ensuring that Oracle doesn't gain money and market share by killing an Open Source competitor. Today MySQL, tomorrow PostgreSQL. Yes, PostgreSQL can also be killed; To prove the case, think what would happen if someone managed to ensure that the top 20 core PostgreSQL developers could not develop PostgreSQL anymore or if each of these developers would fork their own PostgreSQL project."
Later on in his blog he raises the same theme again with a slight bit more detail:
"Note that not even PostgreSQL is safe from this threat! For example, Oracle could buy some companies developing PostgreSQL and target the core developers. Without the core developers working actively on PostgreSQL, the PostgreSQL project will be weakened tremendously and it could even die as a result."
Is this a valid concern? It's easy enough to overlook it considering the Sturm und Drang in Monty's recent posts, but I think this is something worth seriously looking into. Specifically, is the Postgres project capable of withstanding a direct threat from a large company with deep pockets (e.g. Oracle)?
To get to the answer, let's run some numbers first. Monty mentions the "top 20" Postgres developers. If we look at the community contributors page, we see that there are in fact 25 major developers listed, as well as 7 core members, so 20 would indeed be a significant chunk of that page. To dig deeper, I looked at the cvs logs for the year of 2009 for the Postgres project, and ran some scripts against them. The 9185 commits were spread across 16 different people, and about 16 other people were mentioned in the commit notes as having contributed in some way (e.g. a patch from a non-committer). So again, it looks like Monty's number of 20 is a pretty good approximation.
However (and you knew there was a however), the catch comes from being able to actually stop 20 of those people from working on Postgres. There are basically two ways to do this: Oracle could buy out a company, or they could hire (buy out) a person. The first problem is that the Postgres community is very widely distributed. If you look at the people on the community contributors page, you'll see that the 32 people work for 24 different companies. Further, no one company holds sway: the median is one company, and the high water mark is a mere three developers. All of this is much better than it was years ago, in the total number and in the distribution.
The next fly in the ointment is that buying out a company is not always easy to do, despite the size of your pockets. Many companies on that list are privately held and will not sell. Even if you did buy out the company, there is no way to prevent the people working there from then moving to a different company. Finally, buying out some companies just isn't possible, even if you are Oracle, because there are some big names on the list of people employing major Postgres developers: Google, Red Hat, Skype, and SRA. Then of course there is NTT, which is a really, really big company (larger than Oracle). NTT's Postgres developers are not always as visible as some of the English-speaking ones, but NTT employs a lot of people to work on Postgres (which is extremely popular in Japan).
The second way is hiring people directly. However, people can not always be bought off. Sure, some of the developers might choose to leave if Oracle offered them $20 million dollars, but not all of them (Larry, I might go for $19 million, call me :). Even if they did leave, the depth of the Postgres community should not be underestimated. For every "major developer" on that page, there are many others who read the lists, know the code well, but just haven't, for one reason or another, made it on to that list. At a rough guess, I'd say that there are a couple hundred people in the world who would be able to make commits to the Postgres source code. Would all of them be as fast or effective as some of the existing people? Perhaps not, but the point is that it would be nigh impossible to thin the pool fast enough to make a dent.
The project's email lists are as strong as ever, to such a point that I find it hard to keep up with the traffic, a problem I did not have a few years ago. The number of conferences and people attending each is growing rapidly, and there is a great demand for people with Postgres skills. The number of projects using Postgres, or offering it as an alternative database backend, is constantly growing. It's no longer difficult to find a hosting provider that offers Postgres in addition to MySQL. Most important of all, the project continues to regularly release stable new versions. Version 8.5 will probably be released in 2010.
In conclusion, the state of the Postgres project is in great shape, due to the depth and breadth of the community (and the depth and breadth of the developer subset). There is no danger of Postgres going the MySQL route; the PG developers are spread across a number of businesses, the code (and documentation!) is BSD, and no one firm holds sway in the project.
"Amazon RDS provides a fully featured MySQL database, so the code, applications, and tools that you use today with your existing MySQL databases work in Amazon RDS without modification. The service automatically handles common database administration tasks, such as setup and provisioning, patch management, and backup."
"High Availability Offering — For developers and business who want additional resilience beyond the automated backups provided by Amazon RDS at no additional charge. With the high availability offer, developers and business can easily and cost-effectively provision synchronously replicated DB Instances in multiple availability zones (AZ’s), to protect against failure within a single location."
* Double Extra Large: 34.2 GB memory, 13 ECU (4 virtual cores with 3.25 ECU each), 850 GB storage, 64-bit platform
* Quadruple Extra Large: 68.4 GB memory, 26 ECU (8 virtual cores with 3.25 ECU each), 1690 GB storage, 64-bit platform