Archive for the ‘drupal’ Category

Migrating users and content profiles

Декабрь 29th, 2010

For some time now I've been working on a Drupal site that consists mainly of scraped content from a proprietary, ASP based CMS from the late nineties. The Simple HTML Dom Parser, used from within a drush script, has been invaluable. It made scraping the old site content and importing it as Drupal nodes a relative breeze. (No access to the database used by the CMS, boo!)

Part of setting up the new site is importing users and their content profile nodes from a different Drupal site, that was setup a year or two ago to manage an event.

I had hoped there would be a way for me to export these users and their profile nodes from one Drupal to the other, but though I found modules to export one or the other, I might still end up with profile nodes that were no longer related to their users. Of course, that's pretty useless.

When I remembered I was also supposed to add all these users to a simplenews newsletter, the proverbial light bulb came on.

A spot of theory

I have previously done some bulk simplenews subscriptions by selecting Drupal data from one database table into another. In theory, that works fine across different databases as well. By inserting selected data from one Drupal's users table into another (avoiding uid clashes) I should be able to copy all users, keeping a predictable uid, so I could then do the same for their profile nodes.

First off, export the profile content type to the new Drupal site using the CCK content export module.

The grab the schema of all the tables I would need to transpose data to. These should be the same as the tables the data will come from. In my case the tables were: users, content_type_profile, node and node_revisions. The easiest way to grab the schema is via the mysql command line tool, using this query:

MySQL [source]> SHOW CREATE TABLE users\G

The \G means the data is shown in a single column, not in a column per field, which wouldn't fit across the terminal. Just repeat this for the other tables that need to be transposed and keep the output in a text editor window or something. Basically, this is just so you can easily copy and paste the field names.

At the bottom of each schema, you'll see a line like this one:

) ENGINE=InnoDB AUTO_INCREMENT=6221 DEFAULT CHARSET=utf8

The one from the node_revisions table is the important one. The AUTO_INCREMENT value tells us the next id that will be assigned on the primary auto_increment field in that table. In this case that is the vid field. It's important, because in all likelihood there is already a vid in this table for each value below the AUTO_INCREMENT value.

If you simply copy the node data across, chances some node.nid or node_revisions.vid will clash, causing the copy to break, leaving you with a cleanup to do. They do need to be included in the copy, though, because they are the way the user, her profile node and its CCK fields are linked together. If I simply copied all the other table columns and had the database assign new uids, nids and vids, I'd have a mess of information without its relational links intact.

My table has at most 6220 node revisions, so if I start with 10000 as the nid and vid for the copied nodes, there will never be a clash. Because I already had some users on the new Drupal, I decided to restart all copied user uid from 10000 as well.

At the end of the day the user.uid, node.nid and node_revisions.vid are just automatically assigned numbers, so if a chunk of them between 6220 and 10000 are not used, that hardly matters.

Run some queries

Time to migrate some data. In the following examples I execute all queries on the target database. All source data is coming from the Drupal tables in the oldevent database. These queries all need to be run by a MySQL user with INSERT privileges on the target Drupal database and at least SELECT privileges on the source Drupal database.

First, copy the users and give them a predictable non-clashing new user.uid. I exclude user id 1, as it already exists on the new Drupal site. I knew the other user's names would not clash, so I did not need to limit the query any further.

MySQL [source]> INSERT INTO users (SELECT uid+10000, name, pass, mail, mode, sort, threshold, theme, signature, signature_format, created, access, login, status, timezone, language, picture, init, data, timezone_name FROM oldevent.users WHERE oldevent.users.uid > 1);

Now copy all profile node CCK fields, making sure the nid and vid will not clash.

MySQL [source]> INSERT INTO content_type_profile (SELECT vid+10000, nid+10000, field_profile_firstname_value, field_profile_surname_value, field_profile_position_value, field_profile_organisation_value, field_profile_phone_value FROM oldevent.content_type_profile);

Note that iIf you have multi-value fields in your user's profiles, you will also need to perform this for all content_field_profile_* tables.

Next, copy across all profile nodes:

MySQL [source]> INSERT INTO node (SELECT nid+10000, vid+10000, type, language, title, uid+10000, status, created, changed, comment, promote, moderate, sticky, tnid, translate FROM oldevent.node WHERE oldevent.node.type='profile');

And finally, copy all profile node revisions across as well:

MySQL [source]> INSERT INTO node_revisions (SELECT nid+10000, vid+10000, uid+10000, title, body, teaser, log, timestamp, format FROM oldevent.node_revisions WHERE oldevent.node_revisions.nid IN (SELECT nid FROM oldevent.node WHERE oldevent.node.type='profile'));

And that's it. I did not need to copy any of the user's roles across, so at this point I was done. I migrated just under 1500 users and all revisions of their profiles from one Drupal to another in approximately two minutes.

Inevitable caveat!

If your users are set up with profile pictures, you'll need to make sure the users.picture field points at the right directory under files on the new Drupal site. You can do a quick search & replace query to update the path using REPLACE().

MySQL [source]> UPDATE users SET picture=REPLACE(picture, 'old-sub-string', 'new-sub-string');

Note that the picture filename will not match the user's uid anymore. If you use a tool like mmv or a shell script to rename the picture files, be sure to also update the field in the database. (Hint: generate a new file path via CONCAT() , using the user.uid field)

Trackback URL for this post: http://cafuego.net/trackback/390


PlanetMySQL Voting: Vote UP / Vote DOWN

Leaving MariaDB/Monty Program

Ноябрь 19th, 2010

Last week I announced internally that after my paternity leave ends next year, I will not be returning to Monty Program.

When I joined the company over a year ago I was immediately involved in drafting a project plan for the Open Database Alliance and its relation to MariaDB. We wanted to imitate the model of the Linux Foundation and Linux project, where the MariaDB project would be hosted by a non-profit organization where multiple vendors would collaborate and contribute. We wanted MariaDB to be a true community project, like most successful open source projects are - such as all other parts of the LAMP stack.

read more


PlanetMySQL Voting: Vote UP / Vote DOWN

Call for Papers: "MySQL and Friends" Developer Room at FOSDEM 2011 (Feb. 5th, Brussels, BE)

Ноябрь 2nd, 2010

FOSDEM banner

It's that time of the year again — the nice folks at FOSDEM have granted us a developer room at their upcoming conference (February 5+6 2011 in Brussels, Belgium)!

As usual there were more applications than they were able to accommodate, so we are very grateful for this opportunity for collaboration. Titled "MySQL and Friends", our room next year will be H.2213 with a capacity of 100 seats. It will be at our disposal on Saturday 5th, from 13:00 till 19:00. Like last year, we would like to set up a schedule of talks related to the MySQL server and the various projects that surround it. Each talk will last 20 minutes, plus 5 minutes of Q&A and a 5 minute break for switching speakers, giving us 12 slots in total to fill with excellent tech talks. Take a look at this year's schedule for some examples! I hope we can assemble an even more exciting and interesting schedule for next year.

Quoting from my last year's call for papers:

We are looking for covering a wide range of topics that attract both MySQL DBAs as well as application developers that work with MySQL as their database of choice. Are you developing a new storage engine or other plugin? Do you want to share your experiences and best practices in administering or deploying MySQL servers? Did you develop a new method to scale a MySQL setup? Let us and the audience know about it! You can submit your talk proposal via this submission form.

The deadline for turning in your proposal is Sunday, 26th of December, 2010, after which there will be a voting and rating period to identify the most interesting and attractive topics.

Please check the FOSDEM 2011 information page on the MySQL Forge Wiki for more details and don't hesitate to contact me directly, if you have any questions or suggestions. I look forward to your proposals!
PlanetMySQL Voting: Vote UP / Vote DOWN

LAMP stack Halloween cake

Октябрь 31st, 2010

Barry Jaspan and his wife Heather spent 20 hours creating this incredible cake for Acquia's Halloween party. Creative duo! Not only did it look great, it was yummy. Trick or treat!

Halloween
Halloween

PlanetMySQL Voting: Vote UP / Vote DOWN

Drupal 7 test drive appliance updated to 7.0-beta2, now with GUI option

Октябрь 26th, 2010

Drupal logoOver the weekend I updated my Drupal 7 test appliance in SUSE Studio to the Drupal 7.0-beta2 release, which was released on Oct. 23rd. I also added phpMyAdmin upon a user request, to provide a web-based method to work with the MySQL instance, if needed.

In addition to the lightweight "headless" appliance (which can only be accessed and configured via a remote network connection), I've now also created a GUI-based version. This appliance starts a minimal GNOME desktop and a Mozilla Firefox browser, which in turn opens the Drupal installation page by default. I hope you will find this useful if you want to toy around and test Drupal 7 without having to go through the entire OS and LAMP stack configuration yourself. In fact, you can even test this appliance via the recently added test drive option from right out of your web browser!

The appliance is now also available in OVF format. SuSE Studio now also builds Amazon EC2 images, which don't seem to be available for download from the SUSE Gallery yet. I assume this is a recent addition to the continuously improving SUSE Studio functionality, hopefully these images will be made available soon.


PlanetMySQL Voting: Vote UP / Vote DOWN

Testing Drupal 7 on a virtual appliance with MySQL 5.1 and the InnoDB plugin

Сентябрь 18th, 2010

Drupal logoThe Drupal community just recently released another alpha test release of their upcoming Drupal 7 version, to shake out the remaining bugs and to encourage more users to test it.

If you would like to give it a try, but you don't have a free server handy, how about using a virtual machine instead? Using the fabolous SuSE Studio, I've created an appliance based on openSUSE 11.3, Drupal 7.0-alpha7 and MySQL 5.1 with the InnoDB plugin and strict mode enabled (both for the SQL mode and InnoDB mode. Using this configuration helps to ensure that Drupal works well with the current version of MySQL/InnoDB and does not use any "questionable" SQL statements. This might be especially interesting for additional modules - Drupal core did not reveal any problems using strict mode so far.

You can download disk images for VMware/Virtualbox/KVM or XEN from the SUSE Gallery (free login required). Just boot the appliance in your virtualization application of choice, choose your keyboard layout and step through the network configuration and Time Zone selection. Once the appliance has booted up and the login: prompt appeared, point your web browser to the appliance's IP address to start the Drupal installation/configuration. MySQL has been pre-configured, there is an empty database named "drupal" and a user "drupal" with the same password to access it. You just need to enter this information in the Drupal Database configuration dialogue during the installation. Anything else can be configured to your liking.

After you have finished the installation, you can toy around with a fresh Drupal 7 installation! Install additional modules, change the themes, add content. And make sure to report all bugs that you run into while doing so! Have fun.


PlanetMySQL Voting: Vote UP / Vote DOWN

MariaDB going places: Drupal, Zend Framework

Сентябрь 10th, 2010

MariaDB just released MariaDB 5.1.50. Linux, Solaris and Windows (32-bit) are supported. Go give it a twirl.

Its also worth noting that today when you go install Drupal, and check out the documentation for requirements, you’ll see that MariaDB is now a recommended option.

Via the new news page at Monty Program’s newly designed website, I also found out that MariaDB is now recommended in the documentation for the next release of the Zend Framework. See the changelog.

Related posts:

  1. MariaDB 5.1.44 released
  2. VirtualBox images for MariaDB
  3. MariaDB 5.1.42 released!



PlanetMySQL Voting: Vote UP / Vote DOWN

Fast paging in the real world

Май 31st, 2010

This blag was originally posted at http://cafuego.net/2010/05/26/fast-paging-real-world

Some time ago I attended the “Optimisation by Design” course from Open Query¹. In it, Arjen teaches how writing better queries and schemas can make your database access much faster (and more reliable). One such way of optimising things is by adding appropriate query hints or flags. These hints are magic strings that control how a server executes a query or how it returns results.

An example of such a hint is SQL_CALC_FOUND_ROWS. You use it in a select query with a LIMIT clause. It instructs the server to select a limited numbers of rows, but also to calculate the total number of rows that would have been returned without the limit clause in place. That total number of rows is stored in a session variable, which can be retrieved via SELECT FOUND_ROWS(); That simply reads the variable and clears it on the server, it doesn’t actually have to look at any table or index data, so it’s very fast.

This is useful when queries are used to generate pages of data where a user can click a specific page number or click previous/next page. In this case you need the total number of rows to determine how many pages you need to generate links for.

The traditional way is to first run a SELECT COUNT(*) query and then select the rows you want, with LIMIT. If you don’t use a WHERE clause in your query, this can be pretty fast on MyISAM, as it has a magic variable that contains the number of rows in a table. On InnoDB however, which is my storage engine of choice, there is no such variable and consequently it’s not pretty fast.

Paging Drupal

At DrupalConSF earlier this year I’d floated the idea of making Drupal 7 use SQL_CALC_FOUND_ROWS in its pager queries. These are queries generated specifically to display paginated lists of content and the API to do this is pretty straightforward. To do it I needed to add query hint support to the MySQL driver. When it turned out that PostgreSQL and Oracle also support query hints though, the aim became adding hint support for all database drivers.

That’s now done, though only the patch only implements hints on the pager under MySQL at the moment.

One issue keeps cropping up though, a blog by Alexey Kovyrin in 2007 that states SELECT COUNT(*) is faster than using SQL_CALC_FOUND_ROWS. It’s all very well to not have a patch accepted if that statement is correct, but in my experience that is in fact not the case. In my experience the stats are in fact the other way around, SQL_CALC_FOUND_ROWS is nearly always faster than SELECT COUNT(*).

To back up my claims I thought I should run some benchmarks.

I picked the Drupal pager query that lists content (nodes) on the content administration page. It selects node IDs from the node table with a WHERE clause which filters by the content language. Or, in plain SQL, what currently happens is:

SELECT COUNT(*) FROM node WHERE language = 'und';
SELECT nid FROM node WHERE language = 'und' LIMIT 0,50;

and what I’d like to happen is:

SELECT SQL_CALC_FOUND_ROWS nid FROM node WHERE language = 'und' LIMIT 0,50;
SELECT FOUND_ROWS();

Methodology

I ran two sets of tests. One on a node table with 5,000 rows and one with 200,000 rows. For each of these table sizes I ran a pager with 10, 20, 50, 100 and 200 loops, each time increasing the offset by 50; effectively paging through the table. I ran all these using both MyISAM and InnoDB as the storage engine for the node table and I ran them on two machines. One was my desktop, a dual core Athlon X2 5600 with 4Gb of RAM and the other is a single core Xen virtual machine with 512Mb of RAM.

I was hoping to also run tests with 10,000,000 rows, but the virtual machine did not complete any of the queries. So I ran these on my desktop machine only. Again for 10, 20, 50, 100 and 200 queries per run. First with an offset of 50, then with an offset of 10,000. I restarted the MySQL server between each run. To discount query cache advantages, I ran all tests with the query cache disabled. The script I used is attached at the bottom of this post. The calculated times do include the latency of client/server communication, though all tests ran via the local socket connection.

My desktop runs an OurDelta mysql .5.0.87 (the -d10-ourdelta-sail66) to be exact. The virtual machine runs 5.0.87 (-d10-ourdelta65).  Before you complain that not running a vanilla MySQL invalidates the results, I run these because I am able to tweak InnoDB a bit more, so the I/O write load on the virtual machine is somewhat reduced compared to the vanilla MySQL.

Results

Query time graphs - NEW is faster than OLD and InnoDB is not slower than MyISAM

The graphs show that using SQL_CALC_FOUND_ROWS is virtually always faster than running two queries that each need to look at actual data. Even when using MyISAM. As the database gets bigger, the speed advantage of SQL_CALC_FOUND_ROWS increases. At the 10,000,000 row mark, it’s consistently about twice as fast.

Also interesting is that InnoDB seems significantly slower than MyISAM on the shorter runs. I say seems, because (especially with the 10,000,000 row table) the delay is caused by InnoDB first loading the table from disk into its buffer pool. In the spreadsheet you can see the first query takes up to 40 seconds, whilst subsequent ones are much faster. The MyISAM data is still in the OS file cache, so it doesn’t have that delay on the first query. Because I use innodb_flush_method=O_DIRECT, the InnoDB data is not kept in the OS file cache.

Conclusion

So, it’s official. COUNT(*) is dead, long live SQL_CALC_FOUND_ROWS!  :-)

I’ve attached my raw results as a Gnumeric document, so feel free to peruse them. The test script I’ve used is also attached, so you can re-run the benchmark on your own systems if you wish.

Conclusion Addendum

As pointed out in the Drupal pager issue that caused me to run these tests, the query I’m benchmarking uses the language column, which is not indexed and the test also doesn’t allow the server to cache the COUNT(*) query. I’ve rerun the tests with 10 million rows after adding an index and I no longer get a signification speed difference between the two ways of getting the total number of rows.

So I suppose that at least SQL_CALC_FOUND_ROWS will cause your non-indexed pager queries to suck a lot less than they might otherwise and it won’t hurt if they are properly indexed :-)

¹ I now work for Open Query as a consultant.


PlanetMySQL Voting: Vote UP / Vote DOWN

Loadays CFP

Февраль 11th, 2010

I would like to point the crowd to the Call For Presentaions of Loadays. , the Linux Open Administration Days .


The Linux Open Administration days 2010 will be the first edition of a new conference focusing on Linux and Open Administration, we are trying to fill a gap for System Engineers and Administrators using Open Source technologies"

More details on the Linux Open Administration Days site

I'll probably be there .. given the fact that the event will be 5 minutes from where I live .

Trackback URL for this post:

http://www.krisbuytaert.be/blog/trackback/984

PlanetMySQL Voting: Vote UP / Vote DOWN

Fosdem 2010

Февраль 9th, 2010

Fosdem 2010 was my 10th fosdem Edition (including the first OSDEM)
As every year Fosdem suffered even more from it's own success.

On Friday evening ther was the obligatory Beer event... however as people need to eat to .. the Devops crowd fled the scene

I had made reservations for a 20 something group and with the CentOS crowd joining us (as there was some overlap anyhow) we were 25 when we arrived in the restaurant .

Dinner and Discussions were great .. I learned about some new projects and we had some insightfull dicussions on how fat your thin foil should be ...

After dinner we went back to the Beer Event were lots of Free Beer was tasted ...

Saturday was the first full day of Fosdem, as usual Fosdem was the victim of it's own success , too much interresting stuff to see .. too little time.

Lots of Devrooms had the "FULL" sign put up more than you want as a visitor ...
I never even made it into the Drupal or NoSQL rooms :(

Sadly I had to correct Ploum's first law but for a lot of people .. Fosdem each time means the battle choosing which presentation you'll go to ... If you can even make it to the talk .. as usually the the hallway track is much more interresting :)

The MySQL devroom was on sunday but on saturday the crowd met in an Italian place nearby the Fosdem campus to get to know each other and chat a lot ..
The discussions ranged from French vs Canadaian and the future of Forks

During Saturday afternoon @patrickdebois suggested a Devops Meetup for Breakfast ... and some how that also happened ...

When I arrived at Fosdem on sunday morning they were still there :)

I spent the biggest part of Sunday in and around the MySQL devroom listening to a bunch of interresting talks

My own talk went fairly well given the time constraint .. sqeezing content for an hour in 20 minutes means you can't really go deep into the topics ...
My initial plan was to only focus on Pacemaker integration however the community had voted for the overview talk :)

After my own talk it was almost time to head to the Janson auditorium for the Footnote of Greg Kroah-Hartman, and as on any Fosdem .. there once again had to be Beer

Trackback URL for this post:

http://www.krisbuytaert.be/blog/trackback/983

PlanetMySQL Voting: Vote UP / Vote DOWN