Archive for the ‘XML’ Category

Linux Documentation Writer Wanted!

Февраль 9th, 2012

The Oracle Linux and Virtualization Documentation Team is seeking an experienced Technical Writer
with a focus on writing documentation for the Oracle Linux product. (The MySQL Documentation Team is part of that group as well.)

Applicants should be located in either Ireland, the UK, Sweden, Norway, Denmark, or Finland (click on the links for a detailed job description).

We're a vastly distributed team, with writers in Australia, North America, and Europe. Our infrastructure is based on DocBook XML, and we're not just writing docs, but also maintain the whole processing and publication work chain.

Key competencies you should have include:

  • 3 or more years previous experience in writing software documentation (please provide URLs of your writings I can look at!)
  • Experience with writing documentation for system level software and operating systems
  • Strong knowledge of the Linux operating system
  • Strong knowledge of XML, DocBook XML, and XSL style sheets (and motivation to help maintain and expand our tools and infrastructure)
  • Ability to administer own workstation and test environment
  • Good experience with distributed working environments and versioning systems such as SVN

If this sounds like something for you, follow the links above and send in your application!


PlanetMySQL Voting: Vote UP / Vote DOWN

Guidelines for generating XML

Июль 9th, 2010

Over the last little while I've come across quite a few XML feed generators written in PHP, with varying degrees of 'correctness'. Even though generating XML should be very simple, there's still quite a bit of pitfalls I feel every PHP or (insert your language)-developer should know about.

1. You are better off using an XML library

This is the first and foremost rule. Most people end up generating their xml using simple string concatenation, while there are many dedicated tools out there that really help you generate your own XML.

In PHP land the best example is XMLWriter. It is actually quite easy to use:

  1. <?php
  2.  
  3. $xmlWriter = new XMLWriter();
  4. $xmlWriter->openMemory();
  5. $xmlWriter->startDocument('1.0','UTF-8');
  6. $xmlWriter->startElement('root');
  7. $xmlWriter->text('Contents of the root tag');
  8. $xmlWriter->endElement(); // root
  9. $xmlWriter->endDocument();
  10. echo $xmlWriter->outputMemory();
  11.  
  12. ?>

Granted, XMLWriter is verbose, but you have to worry a lot less about escaping and validating your xml documents.

2. Understand Unicode

Do you know the difference between a byte, a character and a codepoint? If you don't, I'd probably think twice about hiring you. It's absolutely shocking how many programmers are out there that don't understand the basics of unicode, UTF-8 and how it relates to the web.

An often-heard excuse for not having to care for non-ascii characters, such as people in English speaking countries. However, if you need to use the euro-sign (€) or if you deal with people copy-pasting from word documents, you most definitely will come across problems.

A simple call to utf8_encode is not actually enough. If some of your source-data was already encoded as UTF-8 you will end up losing data. Only use utf8_encode if you know your source-data is encoded as ISO-8859-1.

The one true way to go about it, is to make sure that every step of the way in your web application is UTF-8. Including your HTTP/HTML contenttype, MySQL database and anything that basically ingests data for your application (email, csv importers, xml readers, web services). Once you are absolutely sure every part in your application is UTF-8, and converted any old data things will start to behave correctly.

3. CDATA is never a solution

It might be tempting to solve any encoding issues by simply surrounding it with <![CDATA[ and ]]>. This might make sure that XML parsers don't throw an error when reading, but they still have 'incorrect' characters. If your XML document has CDATA tags, or you think you need CDATA, you are probably wrong.

More often than not using CDATA actually stems from encoding problems (see section 2). CDATA is not a method to encode binary characters, xml parsers will still throw errors if they come across certain byte sequences. If you do really need to encode binary data in XML, the best way is to use something like base64_encode instead.

If your XML feed uses CDATA because of encoding issues you actually defer your problem to the consumer of your XML feed. So instead of seeing 'weird characters' on your side, the person that reads your xml feed now has no good way to detect which encoding was actually used. If it's for example an RSS feed you're generating, this can result in RSS readers throwing errors, or characters showing up incorrectly.

4. Be liberal with whitespace

An error like "unexpected character at line 1, column 176456" is much harder to debug than "line 5078, column 24". Whitespace between xml tags does usually not have any significance, so you can add as much indentation and linebreaks (\n) as you want. Note that tools such as XMLWriter will indent for you automatically.

5. Be verbose

Even though you might easily figure out that <ORD_NR> means 'order number', there's no reason to actually state it as <order-number>. Note that the following rules appear to fall in favor for most people:

  • Use lowercase for tags and attribute names.
  • Use dashes (-) to separate words, not underscores (_).
  • Minimize the use of attributes, nested tags allow more flexibility.

6. Be careful with entities

The only valid entities in XML are &lt; (<), &gt; (>) &amp; (&) and &quot; ("), so any other entity will simply not work and throw errors.

HTML DTD's add many entities, so if you're mostly used to using HTML you might expect other entities to work. If your source-data already has entities, you might have to get rid of these first.

In PHP it means you should use htmlspecialchars, instead of htmlentities.

Feel free to discuss, disagree, or add on to this list in the comments, I'm happy to hear your experiences.


PlanetMySQL Voting: Vote UP / Vote DOWN

Breaking news: SHOW INNODB STATUS ported to XML

Апрель 20th, 2010

If you’re like me, you’ve gotten tired of writing endless test cases for parsers that can understand the thousands of variations of text output by SHOW INNODB STATUS. I’ve decided to solve this issue once and for all by patching MySQL and InnoDB to output XML, the universal markup format, so tools can understand and manipulate it easily. Here’s a sample snippet:

<status><![CDATA[
=====================================
100320 15:46:24 INNODB MONITOR OUTPUT
=====================================
... text omitted, but you get the idea ...
]]>
</status>

PS: Yes, this is a late April Fool’s joke.

Related posts:

  1. Don’t forget about SHOW PROFILES It seems t
  2. A growing trend: InnoDB mutex contention I’ve
  3. What do the InnoDB insert buffer statistics mean? Ever seen

Related posts brought to you by Yet Another Related Posts Plugin.


PlanetMySQL Voting: Vote UP / Vote DOWN

Restoring XML-formatted MySQL dumps

Апрель 20th, 2010
To whom it may concern -

The mysqldump program can be used to make logical database backups. Although the vast majority of people use it to create SQL dumps, it is possible to dump both schema structure and data in XML format. There are a few bugs (#52792, #52793) in this feature, but these are not the topic of this post.

XML output from mysqldump

Dumping in XML format is done with the --xml or -X option. In addition, you should use the --hex-blob option otherwise the BLOB data will be dumped as raw binary data, which usually results in characters that are not valid, either according to the XML spec or according to the UTF-8 encoding. (Arguably, this is also a bug. I haven't filed it though.)

For example, a line like:

mysqldump -uroot -pmysql -X --hex-blob --databases sakila
dumps the sakila database to the following XML format:

<?xml version="1.0"?>
<mysqldump xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<database name="sakila">
<table_structure name="actor">
<field Field="actor_id" Type="smallint(5) unsigned" Null="NO" Key="PRI" Extra="auto_increment" />
<field Field="first_name" Type="varchar(45)" Null="NO" Key="" Extra="" />
<field Field="last_name" Type="varchar(45)" Null="NO" Key="MUL" Extra="" />
<field Field="last_update" Type="timestamp" Null="NO" Key="" Default="CURRENT_TIMESTAMP" Extra="on update CURRENT_TIMESTAMP" />
<key Table="actor" Non_unique="0" Key_name="PRIMARY" Seq_in_index="1" Column_name="actor_id" Collation="A" Cardinality="200" Null="" Index_type="BTREE" Comment="" />
<key Table="actor" Non_unique="1" Key_name="idx_actor_last_name" Seq_in_index="1" Column_name="last_name" Collation="A" Cardinality="200" Null="" Index_type="BTREE" Comment="" />
<options Name="actor" Engine="InnoDB" Version="10" Row_format="Compact" Rows="200" Avg_row_length="81" Data_length="16384" Max_data_length="0" Index_length="16384" Data_free="233832448" Auto_increment="201" Create_time="2009-10-10 10:04:56" Collation="utf8_general_ci" Create_options="" Comment="" />
</table_structure>
<table_data name="actor">
<row>
<field name="actor_id">1</field>
<field name="first_name">PENELOPE</field>
<field name="last_name">GUINESS</field>
<field name="last_update">2006-02-15 03:34:33</field>
</row>

...many more rows and table structures...

</database>
</mysqldump>
I don't want to spend too much time discussing why it would be useful to make backups in this way. There are definitely a few drawbacks - for example, for sakila, the plain SQL dump, even with --hex-blob is 3.26 MB (3.429.358 bytes), whereas the XML output is 13.7 MB (14,415,665 bytes). Even after zip compression, the XML formatted dump is still one third larger than the plain SQL dump: 936 kB versus 695 kB.

Restoring XML output from mysqldump

A more serious problem is that MySQL doesn't seem to offer any tool to restore XML formatted dumps. The LOAD XML feature, kindly contributed by Erik Wetterberg could be used to some extent for this purpose. However, this feature is not yet available (it will be available in the upcoming version MySQL 5.5), and from what I can tell, it can only load data - not restore tables or databases. I also believe that this feature does not (yet) provide any way to properly restore hex-dumped BLOB data, but I really should test it to know for sure.

Anyway.

In between sessions of the past MySQL users conference I cobbled up an XSLT stylesheet that can convert mysqldump's XML output back to SQL script output. It is available under the LGPL license, and it is hosted on google code as the mysqldump-x-restore project. To get started, you need to download the mysqldump-xml-to-sql.xslt XSLT stylesheet. You also need a command line XSLT processor, like xsltproc. This utility is part of the Gnome libxslt project, and is included in packages for most linux distributions. There is a windows port available for which you can download the binaries.

Assuming that xsltproc is in your path, and the XML dump and the mysqldump-xml-to-sql.xslt are in the current working directory, you can use this command to convert the XML dump to SQL:

xsltproc mysqldump-xml-to-sql.xslt sakila.xml > sakila.sql
On Unix-based systems you should be able to directly pipline the SQL into mysql using

mysql -uroot -pmysql < `xsltproc mysqldump-xml-to-sql.xslt sakila.xml`
The stylesheet comes with a number of options, which can be set through xsltproc's --stringparam option. For example, setting the schema parameter to N will result in an SQL script that only contains DML statements:

xsltproc --stringparam schema N mysqldump-xml-to-sql.xslt sakila.xml > sakila.sql
Setting the data option to N will result in an SQL script that only contains DDL statements:

xsltproc --stringparam data N mysqldump-xml-to-sql.xslt sakila.xml > sakila.sql
. There are additional options to control how often a COMMIT should be issued, whether to add DROP statements, whether to generate single row INSERT statements, and to set the max_allowed_packet size.

What's next?

Nothing much really. I don't really recommend people to use mysqldump's XML output. I wrote mysqldump-x-restore for those people that inherited a bunch of XML formatted dumps, and don't know what to do with them. I haven't thouroughly tested it - please file a bug if you find one. If you actually think it's useful and you want more features, please let me know, and I'll look into it. I don't have much use for this myself, so if you have great ideas to move this forward, I'll let you have commit access.

That is all.

PlanetMySQL Voting: Vote UP / Vote DOWN

Kontrollbase – revision 297 fixes Reporter-CLI “alert_22″ sub-routine

Апрель 13th, 2010
Quick note to let our users know that there was an XML tag closure error on the “alert_22″ subroutine in the “bin/kontroll-reporter-cli.pl” script. This does not affect the webapp portion of Kontrollbase – only reports generated via the command line reporter script. It is not a fatal error but will cause the XML file to [...]
PlanetMySQL Voting: Vote UP / Vote DOWN

Kontrollbase reporter XML Parser error has been fixed

Январь 22nd, 2010
If you have seen the following error on the Perf Report tab “Message: SimpleXMLElement::__construct()…” – it has been fixed in revision 281. This only affects alerts 11 and 12 so you might not run into it immediately. The solution is to either remove lines “586, 590, 639, 650″ from the bin/kontroll-reporter-5.0.x_linux-x86-2.0.1.pl file, or to run [...]
PlanetMySQL Voting: Vote UP / Vote DOWN

RESTful PHP Web Services – reviewed

Январь 21st, 2010

I’ve been using a lot of RESTful services these days and have been waiting for a good book that is dedicated to the topic. I recently received a copy of ‘RESTful PHP Web Services’, which does a successful job of outlining proven concepts in current web technology. If you want to learn the methods for creating and consuming RESTful services then you will find many examples in this book. From the architectural plans to well thought out code samples, the book covers a lot of ground in a relatively quick read.

The first chapter gives the reader a quick introduction to RESTful services and the most common PHP frameworks in use at the time of writing. I particularly enjoyed the section on the Zend framework due to the explanation of benefits over the other frameworks. The chapter also covers the very basics which include a detailed look at exactly what RESTful services means and what technologies are required to use and benefit from a RESTful architecture. The second chapter gives a quick run down of the various methods in use for consumption of data; these being Curl, several HTTP methods, processing data with XML, DOM, and SimpleXML. After those are covered there is a simple example of consuming services like Flickr using the previous methods. This transitions into many more examples of consuming real world services that any developer would find interesting and exciting for data mashups.

The real meat of the book starts in chapter four where we get into designing the resource utilization systems and then the resource clients in chapter five. Those topics basically go over the nuts and bolts of gathering data, manipulating it, updating it, as well as creating fresh data. We get more instruction and usage examples on the Zend framework in chapter seven where the author gives us information on the controllers, models, and view (MVC model). This would not be too useful without knowing how to debug the code that we’re using so there is, thankfully, a chapter dedicated to debugging XML building and parsing errors. A couple of short appendixes cover the author’s own WSO2 web service framework as well as REST Client Classes which should prove useful for writing your own reusable classes.

Overall this book covers the majority of topics that a new developer needs to understand in order to start developing and deploying RESTful code and web services in PHP. From frameworks to consumable service samples, and everything in between, RESTful PHP Web Services comes through in a concise and enjoyable style that will not disappoint. I highly recommend this book for developers that are new to this topic or experienced developers that need a quick refresher course.


PlanetMySQL Voting: Vote UP / Vote DOWN