Archive for the ‘devops’ Category

CAOS Theory Podcast 2012.01.20

Январь 20th, 2012

Topics for this podcast:

*Hadoop v1.0 and year ahead
*Oracle-Cloudera deal for more Hadoop
*Oracle’s ‘Sun spot’ with Solaris
*Open Source M&A outlook for 2012
*Our new MySQL/NoSQL/NewSQL survey

iTunes or direct download (28:49, 4.9MB)


PlanetMySQL Voting: Vote UP / Vote DOWN

CAOS Theory Podcast 2011.11.11

Ноябрь 11th, 2011

Topics for this podcast:

*Continuent extends MySQL replication to Oracle Database
*CFEngine updates server automation software
*Devops moving mainstream
*Neo Technology integrates with Spring
*451 CAOS report from Hadoop World

iTunes or direct download (26:56, 4.6MB)


PlanetMySQL Voting: Vote UP / Vote DOWN

Why generalists are better at scaling the web

Октябрь 25th, 2011

Recently at Surge 2011, the annual  conference on scalability  and performance, Google's CIO Ben Fried gave an illuminating keynote address. His main insight was that generalists are the people that will lead engineering teams in successfully scaling the web.

In a world where the badge of Specialist or Expert is prized, this was refreshing perspective from an industry bigwig. As tech professionals, or any professional for that matter, we don't welcome the label of generalist. The word suggests a jack-of-all-trades and master of none. But the generalist is no less an expert than the specialist. Generalists can get their hands greasy with the tools to fix bugs in the machine but they especially good at mobilizing the machine itself; with their talents of broad vision, and perspective they can direct an entire team to accomplish tasks efficiently. This ability to see big-picture can not be underestimated especially during times of crisis or pressure to meet targets. For a team to scale the web effectively, you're going to need a good mix of both types of personalities.

Picking out the potential generalist

Startups wanting to achieve scalability  are face with huge pressure to do more with limited budgets.  In bringing on new engineers, they must hire people who have the programming skills to realise their big idea. Ideally these programmers should also have some architectural vision, a knowledge of web operations, and performance as that application becomes popular.  And what of maintaining that large infrastructure as it grows?

So the question for a startup is how do you spot or hire generalists?  In the book, REWORK by  Jason Fried and David Heinemeier Hansson, the authors emphasize good writers and good teachers.  Their point is that in order to teach an idea or concept you have to understand it thoroughly and be able to step into someone elses shoes in order to explain it from their vantage point.

This is in large part the skill that Ben Fried was speaking about at Surge. To borrow his method of using "Disaster Porn" as a way to illustrate a point, we have a story of our own.

Our own disaster porn

About five years ago we worked for a firm who was faced with ongoing challenges of growth.  Their user base was growing by 25%-50% per quarter but they were suffering from outages because of that growth.  What's more one of their top engineers was leaving to join another company.  They took the opportunity to bring us on board to assess the entire infrastructure.

We looked over the architecture and were surprised at every turn.  Although they had a lot of engineers on staff, they were all tasked with building features, and responding to ongoing business requirements.  None were given any operations responsibilities. There was a very obvious lack of leadership. so you can imagine how this turned out to be a recipe for a fine mess. One day we'd see new servers being added at random, another day we'd witness haphazard decisions with what technologies to use or what what versions of frameworks to adopt. In effect, each engineer was making decisions without considering the consequences on the whole.

The infrastructure wound up being built on two different webserver platforms, three - count 'em - three different programming languages and frameworks, and three MySQL databases scattered about on different machines. After a few hours discussing the architecture with the team, I put together a plan that framed the architecture around three simpler tiers.  Two included the standard load balanced webserver tier, and backend database tier, and then a third to manage batch jobs and building static assets and media files.

A generalist solution

Our push then was to standardize on one type of webserver, one version of each language stack, and consolidate all the databases into one instance.  This huge simplification meant that they could add replication to the database tier, eliminating single points of failure, providing redundancy for all business services.  This in itself was a major achievement. We left them with some major problems solved, while offering a new direction, and a better handle on the remaining challenges. What the company had lacked was not engineering know-how, but rather a generalist's perspective.  The engineers had focused too much on immediate tasks, locked on detail, but losing sight of the big picture.

As more companies move their applications to the cloud, some carefully and some not, we anticipate many more disaster scenarios such as these.  This speaks strongly to the rising cult of DevOps and its effort towards broader skills and collaboration among both developers and operations teams. The good thing to come out of it is that cleaning up messes such as these will force us to hone our strategic thinking and organizational skills, possibly making generalists out of many more of us.

 


PlanetMySQL Voting: Vote UP / Vote DOWN

CAOS Theory Podcast 2011.09.30

Сентябрь 30th, 2011

Topics for this podcast:

*Cloud M&A potential around OpenStack
*Oracle’s commercial extensions for MySQL
*Puppet Labs rolls out Enterprise 2.0, hosts PuppetConf
*Basho bolsters Riak distributed data store in NoSQL race
*Our latest special CAOS report, ‘The Changing Linux Landscape’

iTunes or direct download (25:59, 4.4MB)


PlanetMySQL Voting: Vote UP / Vote DOWN

Paying Attention Pays Off

Август 15th, 2011
I often run my ops like I take care of data; a bit overzealously. Case in point, when setting up a new database, I like to throw on a metric for database size, which gets turned into both a graph for trending, but also an alert on database size. Everyone is always on board with trending database size in a graph, but the alert is one people tend to question. This is not entirely without justification.

On a new database, with no data or activity, deciding when to alert is pretty fuzzy. When we set up a new client within our managed hosting service, I usually just toss up an arbitrary number, like 2GB or something. The idea isn't that a 2GB database is a problem, it's that when we cross 2GB, we should probably take a look at the trending graph and do a projection. Depending on how things look, we'll bump up the threshold on the alert to a new level, based on when we think we might want to look at things again. For example, in this graph we take a month long sample, and then project it out for three months. We can then set a new threshold somewhere along that line.

projected db size

While this is good for capacity planning, there's more that can be gained from this process. The act of alerting forces us to pay attention. And if we get notices before our expectations, we go back in and re-evaluate the data patterns. Of course, some times people will question this. Getting a notice that your database has passed 4GB can seem pointless when you have 100+ GB of free space on your disks. And besides, isn't that what free space monitors are for?

Here is a graph of another of our clients database growth. Their data size is not particularly large (don't confuse scalability with size; it doesn't take a large database to have scalability issues), but what's important is that we kept getting notices that the size was growing, and when talking with the developers, no one thought it should be growing at nearly this rate. Eventually we were able to track down the problem to purging job that had gone awry. Once that was fixed, the growth pattern leveled off completely (and the database size returned to the tiny amount that was expected!)

Fix DB Size



PlanetMySQL Voting: Vote UP / Vote DOWN

Talking about Gearman at Etsy Labs

Август 5th, 2011
I find myself flying to New York on Monday for some dealnews related business. Anytime I travel I try and find something fun to do at night. (Watching a movie by myself in Provo, Utah was kinda not that fun.) So, this week I asked on Twitter if anything was happening while I would be in town. Anything would do. A meetup of PHP/MySQL users or some design/css/js related stuff for example. Pretty much anything interesting. Well, later that day I received an IM from the brilliant John Allspaw, Senior VP of Technical Operations at Etsy. He wanted me to swing by the Etsy offices and say hi. Turns out it is only a block away from where I would be. Awesome! He also mentioned that he would like to have me come and speak at their offices some time. That would be neat too. I will have to plan better next time I am traveling up there.

Fast forward another day. I get an email from Kellan Elliott-McCrea, CTO of Etsy wanting to know if I would come to the Etsy offices and talk about Gearman. At first I thought "That is short notice, man. I don't know that I can pull that off." Then I remembered the last time I was asked to speak at an event on short notice based off a recommendation from John Allspaw.

It was in 2008 for some new conference called Velocity. That only turned out to be the best conference I have ever attended. I have been to Velocity every year since and this year took our whole team. In addition, I spoke again in 2009 at Velocity, wrote a chapter for John's book Web Operations that was released at Velocity in 2010 and was invited to take part in the Velocity Summit this year (2011) which helps kick off the planning for the actual conference. The moral of that story for me is: when John Allspaw wants you to take part in something, you do it.

In reality, it was not that tough a decision. Even without John's involvement, I love the chance to talk about geeky stuff. The Etsy and dealnews engineering teams are like two twins separated at birth. Every time we compare notes, we are doing the same stuff. For example, we have been trading Open Source code lately. They are using my GearmanManager and we just started using their statistics collection daemon, statsd. So, speaking to their people about what we do seem like a great opportunity to share and get input.

The event is open to the public. So, if you use Gearman, want to use Gearman, or just want to hear how we use Gearman at dealnews, come here me ramble on about how awesome it is Tuesday night in Dumbo at Etsy Labs. You can RSVP on the event page.

Best Practices for Gearman by Brian Moon
Etsy Labs
55 Washington St. Ste 712
NY 11222

Tuesday, August 09, 2011 from 7:00 PM - 10:00 PM (ET)
PlanetMySQL Voting: Vote UP / Vote DOWN

Velocity 2010 In Review

Июнь 28th, 2010
I just got back from Velocity for the third straight year. I have been to all three of them which is kind of a neat little club to be in. The first one only had maybe 300 people. This year there were over 1,000 attendees. Registration was shut down by the fire code for the rooms we were using. Most sessions had standing room only. It was awesome.

The people that talk at Velocity are really smart. I am always humbled by the likes of John Allspaw. He and I see eye to eye on a lot, but he is so much better at explaining to people and showing them how to make the ideas work. I wish I had his charisma when at the podium. I was lucky enough to write a chapter in a book for John this year. He and Velocity co-chairperson Jesse Robbins organized and authored a book titled Web Operations that debuted at the conference. I basically just told and expanded on my Yahoo story. John loves that story for some reason. I was happy to be a part of it. So many smart people in that book.

The IE9 technology preview dropped while we were there. HTML 5, CSS 3 and more in there. One feature where Microsoft is actually ahead of the curve is in a new DOM level measurement feature. Basically they expose statistics via the DOM about the time it takes to do different things in the page. The other browser vendors in attendance (Google and Mozilla) vowed to support the same data. Another big advancement of IE9 is the heavy use of the GPU for rendering the pages. They have a real advantage here. They are the only browser vendor that is now locked to one operating system. IE9 will require Vista or higher. They can really max out the system for faster rendering.

As usual some of the best content was in the hall ways and bar. We hung out with Theo Schlossnagle from OmniTI and talked about Reconnoiter. It is a kind of Cacti/Ganglia/Nagios all in one. I got to see the Six Apart guys again this year. That is becoming an annual thing. I shared our new Gearman assisted proxy with them. They do some similar stuff for TypePad. More on that proxy later this year. I met a guy from CloudTest. It sounds like a really good use of on-demand cloud resources. I am gonna talk with them about some possible testing.

Membase also dropped while we were there. Most of the persistent key/value stores I have used have disappointed me or just been way too complex for our needs. We don't want a memcached replacement. It does its job damn well. I just need a place to store adhoc data for various applications. Membase is promising because the guys that wrote it are core memcached contributors. There is a company behind it, so it is not as inviting as Drizzle. But, the code is on GitHub so it is more open than say MySQL. Time will tell.

If you have not been to Velocity I encourage you to go next year. It is right for all types of people in the web business. Developers can learn about performance in new ways that will change they way they write code. Operations can learn techniques to make their work day much less painful. Everyone will learn how to empower their business to achieve the goals of the business.

PlanetMySQL Voting: Vote UP / Vote DOWN

Linux Open Administration Days 2010

Апрель 20th, 2010

So about 4 monts ago there was the crazy idea to start a new FOSS event in Belgium targeted at sysadmins.

What started out as an event for local people to meet local people with some local speakers actually ended up being a small local event with some top international speakers on onfiguration mananagement and system administration mixed with a bunch of good local ones !

I had the honour to open the conference with an extremely short version of the Devops talk I gave earlier last year.. extremely short as I knew that over the course of the weekend the topic would reoccur a lot.

We had the first european talk on Chef, by Joshua Timberman, and we had Puppet talks amongst by Dan Bode from Puppetlabs and CFengine talks , devops was a frequently dropped word,

We had a book raffle where we handed out O'Reilly's .. we had a great free pizza party (got the idea from the saturday pizza event at LCA 2005) , and we had some free beer. Sounds like a good combination for a geeky weekend.

Apart from the regular talks there were plenty of Open Spaces where interesting topics were discussed ... we had spaces on Open Source vs Open Core , strong voices were heard when we discussed what we should do with the Open Core companies that claim to value Open Source , some people think we should actually list the fauxpensource ones somewhere and make sure the world knows about them

We had an awesome configuration management discussion session discussing Chef vs Puppet vs CFengine . And much much more ...

Some people owe me plenty of Sushi as I had to do my MySQL HA talk before their Managing MySQL talk , but other than that .. things just went fine..

Trackback URL for this post:

http://www.krisbuytaert.be/blog/trackback/998

PlanetMySQL Voting: Vote UP / Vote DOWN

UKUUG Spring Conference 2010

Апрель 7th, 2010

Last week I was in Manchester for the 2010 UKUUG Spring Conference, right .. make that 2 weeks ago , :)

The UKUUG usually hosts the more interesting conferences around ... , it's not just the schedule that attrackts me , yes there's the strong focus towards Larger Scale Unix (and mostly Linux) deployments and how to manage them, but there's also the opportunity to chat in real life with the Devops from across the chunnel.

Spending time with R.I.Pienaar, Julian Simpson, Simon Wilkinson , Alex Davies , Simon Riggs , Josette, and many others is always fun .

As I was in town early I went to the preconference beer meetup and met with a lot of people and chatted about config management, virtualization and lots of other stuff ... after the pub the plan was to go for curries nearby .. and while walking to the , ahem Bus stop, I managed to recognise Ben Martin from meeting him back ages ago in Hamburg for LinuxKongress , always fun ..

Apart from having to jump on a bus and our group being split at the curry place , rather than being able to tell the latecomers where to walk to and being seeted upstairs with the whole group , the curries were interesting and fun.

As I had been pushing Simon Wardley on Twitter to submit a talk for the conference it was really great to finally see him present .. His talk was the perfect soft introduction to the conference ...

Simon's talk was followed by a talk on Security for the virtual datacenters, after I questionned the speaker if anyone actualy uses TPM outside an academic lab the talk suddenly changed into a commercial presentation for a Quack, nuff said.

The Ever energetic Matt S Trout talked about 21st century perl before Simon "Life is to short for SELinux" Wilkinson talked about his experiences in getting the openAFS crowd on Git.

Bummer Thierry Carrez didn't show us the real juice of UEC and just the installations of a Cloud Controller and a Node Controller , but he managed to do so in approx 30 minutes as promised .

A talk titled Coherent and Integrated Configuration of Virtual Infrastructures always cathces my eye.. however when that talk turns out to be a Coherent and Integrated configuration only within the Univerity of Edinborough (aka lcfg2) talk I`m dissapointed, specially since it pretty much didn't introduce any new concepts from the ones I introduced back in my Durham UKUUG presentation

Luckily Andrew Stribblehill gave a very interesting talk on MySQL scalability, in which I promised him some answers to his questions for the next day :)

The Conference dinner was without a doubt the best UKUUG dinner so far , no typical english "food", no weird location (Old Trafford, an abandoned warship) , but just a big chinese place and plenty of food !

I started thurday morning in the wrong track, I assumed to be in the Virtualization track, but I ended up in the Sun thinclient and Abusing Linux to serve weird desktops under the Green computing umbrella track, not my favourites ..

When Patrick and Julian started their Hudson hit my Puppet with a Cucumber talk (which featured some aweseom #devops content) I was a afraid that we'd had to look for a replacment PostgreSQL talk as Simon hadn't arrived yet .. Luckily he arrived in time for his presentation and he explained us about the new replication features that are slowly making it into PostgreSQL, one way ... log shipping ... not really up to par with other alternatives yet :(

So with no further ado .. here's the presentation I gave

PS. If at a Ukuug event and not sure about a person's name ... try Simon.. pretty good chance you're correct :)

Trackback URL for this post:

http://www.krisbuytaert.be/blog/trackback/997

PlanetMySQL Voting: Vote UP / Vote DOWN

DevOps at dealnews.com

Апрель 5th, 2010
I was telling someone how we roll changes to production at dealnews and they seemed really amazed by it. I have never really thought it was that impressive. It just made sense. It has kind of happened organically here over the years. Anyhow, I thought I would share.

Version Control

So, to start with, everything is in SVN. PHP code, Apache configs, DNS and even the scripts we use to deploy code. That is huge. We even have a misc directory in SVN where we put any useful scripts we use on our laptops for managing our code base. Everyone can share that way. Everyone can see what changed when. We can roll things back, branch if we need to, etc. I don't know how anyone lives with out. We did way back when. It was bad. People were stepping on each other. It was a mess. We quickly decided it did not work.

For our PHP code, we have trunk and a production branch. There are also a couple of developers (me) that like to have their own branch because they break things for weeks at a time. But, everything goes into trunk from my branch before going into production. We have a PHP script that can merge from a developer branch into trunk with conflict resolution assistance built in. It is also capable of merging changes from trunk back into a branch. Once it is in trunk we use our staging environment to put it into production.

Staging/Testing

Everything has a staging point. For our PHP code, it is a set of test staging servers in our home office that have a checkout of the production branch. To roll code, the developer working on the project logs in via ssh to a staging server as a restricted user and uses a tool we created that is similar to the Python based svnmerge.py. Ours is written in PHP and tailored for our directory structure and roll out procedures. It also runs php -l on all .php and .html files as a last check for any errors. Once the merge is clean, the developer(s) use the staging servers just as they would our public web site. The database on the staging server is updated nightly from production. It is as close to a production view of our site as you can get without being on production. Assuming the application performs as expected, the developer uses the merge tool to commit the changes to the production branch. They then use the production staging servers to deploy.

Rolling to Production

For deploying code and hands on configuration changes into our production systems, we have a staging server in our primary data center. The developer (that is key IMO) logs in to the production staging servers, as a restricted user, and uses our Makefile to update the checkout and rsync the changes to the servers. Each different configuration environment has an accompanying nodes file that lists the servers that are to receive code from the checkout. This ensures that code is rolled to servers in the correct order. If an application server gets new markup before the supporting CSS or images are loaded onto the CDN source servers, you can get an ugly page. The Makefile is also capable of copying files to a single node. We will often do this for big changes. We can remove a node from service, check code out to it, and via VPN access that server directly to review how the changes worked.

For some services (cron, syslog, ssh, snmp and ntp) we use Puppet to manage configuration and to ensure the packages are installed. Puppet and Gentoo get along great. If someone mistakenly uninstalls cron, Puppet will put it back for us. (I don't know how that could happen, but ya never know). We hope to deploy more and more Puppet as we get comfortable with it.

Keeping Everyone in the Loop

Having everyone know what is going on is important. To do that, we start with Trac for ticketing. Secondly, we use OpenFire XMPP server throughout the company. The devops team has a channel that everyone is in all day. When someone rolls code to production, the scripts mentioned above that sync code out to the servers sends a message via an XMPP bot that we wrote using Ruby (Ruby has the best multi-user chat libraries for XMPP). It interfaces with Trac via HTTP and tells everyone what changesets were just rolled and who committed them. So, in 5 minutes if something breaks, we can go back and look at what just rolled.

In addition to bots telling us things, there is a cultural requirement. Often before a big roll out, we will discuss it in chat. That is the part than can not be scripted or programmed. You have to get your developers and operations talking to each other about things.

Final Thoughts

There are some subtle concepts in this post that may not be clear. One is that the code that is written on a development server is the exact same code that is used on a production server. It is not massaged in any way. Things like database server names, passwords, etc. are all kept in configuration files on each node. They are tailored for the data center that server lives in. Another I want to point out again is that the person that wrote the code is responsible all the way through to production. While at first this may make some developers nervous, it eventually gives them a sense of ownership. Of course, we don't hire someone off the street and give them that access.  But it is expected that all developers will have that responsibility eventually.
PlanetMySQL Voting: Vote UP / Vote DOWN