Archive for the ‘Development’ Category

common_schema rev. 218: QueryScript, throttling, processes, documentation

Февраль 8th, 2012

common_schema, revision 218 is released, with major new features, top one being server side scripting. Here are the highlights:

  • QueryScript: server side scripting is now supported by common_schema, which acts as an interpreter for QueryScript code.
  • Throttling for queries is now made available via the throttle() function.
  • Enhancements to processlist-related views, including the new slave_hosts view.
  • Inline documentation/help is available via the help() routine.
  • more...

QueryScript

common_schema makes for a QueryScript implementation for MySQL. You can run server side scripts, interpreted by common_schema, which allow for easy syntax and greater power than was otherwise previously available on the MySQL server. For example:

foreach($table, $schema, $engine: table like '%')
  if ($engine = 'ndbcluster')
    ALTER ONLINE TABLE :$schema.:$table REORGANIZE PARTITION;

QueryScript includes flow control, conditional branching, variables & variable expansion, script throttling and more.

Read more on common_schema's QueryScript implementation.

Query throttling

Throttling for MySQL queries was suggested by means of elaborate query manipulation. It is now reduced into a single throttle function: one can now just invoke throttle(3) on one's query, so as to make the query execute for a longer time, while taking short sleep breaks during operation, easing up the query's demand for resources.

Read more on query throttling.

Process views

The processlist_grantees view provides with more details on the running processes. slave_hosts is a new view, listing hostnames of connected slaves.

Read more on process views.

help()

The common_schema documentation is now composed of well over 100 pages, including synopsis, detailed internals discussion, notes and examples. I can't exaggerate in saying that the documentation took the vast majority of time for this code to release.

The documentation is now made available inline, from within you mysql client, via the help() routine. Want to know more about redundant (duplicate) keys and how to find them? Just type:

call help('redundant');

and see what comes out!

The entire documentation, which is available online as well as a downloadable bundle, is embedded into common_schema itself. It's rather cool.

Tests

common_schema is tested. The number of tests in common_schema is rapidly growing, and new tests are introduced for new features as well as for older ones. There is not yet full coverage for all views, but I'm working hard at it. common_schema is a robust piece of code!

Get it!

Download common_schema on the common_schema project page.

Read the documentation online, or download it as well (or call for help())

common_schema is released under the BSD license.


PlanetMySQL Voting: Vote UP / Vote DOWN

QueryScript: SQL scripting language

Февраль 8th, 2012

Introducing QueryScript: a programming language aimed for SQL scripting, seamlessly combining scripting power such as flow control & variables with standard SQL statements or RDBMS-specific commands.

QueryScript is available fro MySQL via common_schema, which adds MySQL-specific usage.

What does QueryScript look like? Here are a few code samples:

Turn a bulk DELETE operation into smaller tasks. Throttle in between.

while (DELETE FROM archive.events WHERE ts < CURDATE() LIMIT 1000)
{
  throttle 2;
}

Convert all InnoDB tables in the 'sakila' database to compressed format:

foreach ($table, $schema, $engine: table in sakila)
{
  if ($engine = 'InnoDB')
    ALTER TABLE :$schema.:$table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
}

Shard your data across multiple schemata:

foreach($shard: {USA, GBR, JAP, FRA})
{
  CREATE DATABASE db_:$shard;
  CREATE TABLE db_:$shard.city LIKE world.City;
  INSERT INTO db_:$shard.city SELECT * FROM world.City WHERE CountryCode = $shard;
}

This tight integration between script and SQL, with the power of iteration, conditional statements, variables, variable expansion, throttling etc., makes QueryScript a power tool, with capabilities superseding those of stored routines, and allowing for simplified, dynamic code.

QueryScript code is interpreted. It's just a text, so it can be read from a @user_defined_variable, a table column, text file, what have you. For example:

mysql> set @script := "while (TIME(SYSDATE()) < '17:00:00') SELECT * FROM world.City WHERE id = 1 + FLOOR((RAND()*4079));";
mysql> call run(@script);

For more details, consult the QueryScript site.

If you're a MySQL user/DBA, better read the common_schema QueryScript documentation, to better understand the specific common_schema implementation and enhanced features.

common_schema, including the QueryScript interpreter, can be downloaded from the common_schema project page.


PlanetMySQL Voting: Vote UP / Vote DOWN

Documentation in SQL: CALL for help()

Январь 11th, 2012

Documentation is an important part of any project. On the projects I maintain I put a lot of effort on documentation, and, frankly, the majority of time spent on my projects is on documentation.

The matter of keeping the documentation faithful is a topic of interest. I'd like to outline a few documentation bundling possibilities, and the present the coming new documentation method for common_schema. I'll talk about any bundling that is NOT man pages.

High level: web docs

This is the initial method of documentation I used for openark kit and mycheckpoint. It's still valid for mycheckpoint. Documentation is web-based. You need Internet access to read it. It's in HTML format.

Well, not exactly HTML format: I wrote it in WordPress. Yes, it's HTML, but there's a lot of noise around (theme, menus, etc.) which is not strictly part of the documentation.

While this is perhaps the easiest way to go, here's a few drawbacks:

  • You're bound to some framework (WordPress in this case)
  • Docs are split between MySQL database (my underlying WordPRess storage) & WordPress files (themes, style, header, footer etc.)
  • Documentation is separate from your code - they're just not in the same place
  • There is no version control over the documentation.

The result is a single source of documentation, which applies to whatever version is latest. It's impossible to maintain docs for multiple versions. You must manually synchronize your WordPress updates with code commits (or rather - code release!).

Mid level: version controlled HTML docs

I first saw this approach on Baron's Aspersa gets a user manual post. I loved it: the documentation is HTML, but stored as part of your project's code, in same version control.

This means one can browse the documentation (openark kit in this example) exactly as it appears in the baseline. Depending on your project hosting, one may be able to do so per version.

The approach has the great benefit of having the docs tightly coupled with the code in terms of development. Before committing code, one updates documentation for that code, then commits/releases both together.

You're also not bound to any development framework. You may edit with vim, emacs, gedit, bluefish, eclipse, ... any tool of your choice. It's all down to plain old text files.

Mid level #2: documentation bundling

One thing I started doing with common_schema is to release a doc bundle with the code. So one can download a compressed bundle of all HTML files. That way one is absolutely certain what's the right documentation for revision 178. There's no effort about it: the docs are already tightly coupled with code versions. Just compress and distribute.

Low level: documentation coupled with your code

Perl scripts can be written as Perl modules, in which case they are eligible for using the perldoc convention. You code your documentation within your script itself, as comment. Perldoc can extract the documentation and present in man-like format. Same happens with Python's pydoc. Baron's When documentation is code illustrates that approach. Maatkit (now Percona Toolkit) has been using it for years.

This method has the advantage of having the documentation ready right within your shell. You don't need a browser, nor firewall access. The docs are just there for you in the same environment where you're executing the code.

SQL Low level: CALL for help()

common_schema is a different type of project. It is merely a schema. There's no Perl nor Python. One imports the schema into one's MySQL server.

What's the low-level approach for this type of code?

For common_schema I use three levels of documentation: the mid-level, where one can browse through the versioned docs, the 2nd mid-level, where one can download bundled documentation, and then a low-level approach: documentation embedded within the code.

MySQL's documentation is also built into the server: see the help_* tables within the mysql schema. The mysql command line client allows one to access help by supporting the help command, e.g.

mysql> help create table;

The client intercepts this command (this is not server side command) and searches through the mysql.help_* docs.

With common_schema, I don't have control over the client; it's all on server side. But the code being a schema, what with stored routines and tables, it's easy enough to set up documentation.

As of the next version of common_schema, and following MySQL's method, common_schema provides a help table:

DESC help;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| topic        | varchar(32) | NO   | PRI | NULL    |       |
| help_message | text        | NO   |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+

And a help() procedure, so that you can call for help(). The procedure will look for the best matching document based on your search expression:

root@mysql-5.1.51> CALL help('match');
+-------------------------------------------------------------------------------+
| help                                                                          |
+-------------------------------------------------------------------------------+
|                                                                               |
| NAME                                                                          |
|                                                                               |
| match_grantee(): Match an existing account based on user+host.                |
|                                                                               |
| TYPE                                                                          |
|                                                                               |
| Function                                                                      |
|                                                                               |
| DESCRIPTION                                                                   |
|                                                                               |
| MySQL does not provide with identification of logged in accounts. It only     |
| provides with user + host:port combination within processlist. Alas, these do |
| not directly map to accounts, as MySQL lists the host:port from which the     |
| connection is made, but not the (possibly wildcard) user or host.             |
| This function matches a user+host combination against the known accounts,     |
| using the same matching method as the MySQL server, to detect the account     |
| which MySQL identifies as the one matching. It is similar in essence to       |
| CURRENT_USER(), only it works for all sessions, not just for the current      |
| session.                                                                      |
|                                                                               |
| SYNOPSIS                                                                      |
|                                                                               |
|                                                                               |
|                                                                               |
|        match_grantee(connection_user char(16) CHARSET utf8,                   |
|        connection_host char(70) CHARSET utf8)                                 |
|          RETURNS VARCHAR(100) CHARSET utf8                                    |
|                                                                               |
|                                                                               |
| Input:                                                                        |
|                                                                               |
| * connection_user: user login (e.g. as specified by PROCESSLIST)              |
| * connection_host: login host. May optionally specify port number (e.g.       |
|   webhost:12345), which is discarded by the function. This is to support      |
|   immediate input from as specified by PROCESSLIST.                           |
|                                                                               |
|                                                                               |
| EXAMPLES                                                                      |
|                                                                               |
| Find an account matching the given use+host combination:                      |
|                                                                               |
|                                                                               |
|        mysql> SELECT match_grantee('apps', '192.128.0.1:12345') AS            |
|        grantee;                                                               |
|        +------------+                                                         |
|        | grantee    |                                                         |
|        +------------+                                                         |
|        | 'apps'@'%' |                                                         |
|        +------------+                                                         |
|                                                                               |
|                                                                               |
|                                                                               |
| ENVIRONMENT                                                                   |
|                                                                               |
| MySQL 5.1 or newer                                                            |
|                                                                               |
| SEE ALSO                                                                      |
|                                                                               |
| processlist_grantees                                                          |
|                                                                               |
| AUTHOR                                                                        |
|                                                                               |
| Shlomi Noach                                                                  |
|                                                                               |
+-------------------------------------------------------------------------------+

I like HTML for documentation. I think it's a good format, provided you don't start doing funny things. Perhaps TROFF is more suitable; certainly more popular on Unix machines. But I already have everything in HTML. So, what do I do?

My decision was to keep documentation in HTML, and use the handy html2text tool to do the job. And it does it pretty well! The sample you see above is an automated translation of HTML to plain text.

I add a few touches of my own: SELECTing long texts is ugly, whether you do it via ";" or "\G". The help() routine breaks the text by '\n', returning a multi row result set. The above sample makes for some 60+ rows, nicely formatted, broken from the original single text appearing in the help table.

So now you have an internal help method for common_schema, right where the code is. You don't have to leave the command line client in order to get help.

Giuseppe offered me the idea for this, even while my own thinking about it was in early stages.

The next version of common_schema will be available in a few weeks. The code is pretty much ready. I just need to work on, ahem..., the documentation.


PlanetMySQL Voting: Vote UP / Vote DOWN

Quick (and dirty) Patch for Ruby Enterprise Edition 2011.03 to Prevent Hash Collision Attacks

Декабрь 29th, 2011

Since there is no patch for Ruby Enterprise Edition 2011.03 to prevent the Hash Collision DoS Attack, I’ve quickly ported ruby 1.8.7 patchlevel 357 patch. Here it is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
From e19bd3eaa8bd71cfc9e5bf436527f015b093f31e Mon Sep 17 00:00:00 2001
From: shyouhei <shyouhei@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date: Wed, 28 Dec 2011 12:47:15 +0000
Subject: [PATCH] -This line, and those below, will be ignored--

M    ruby_1_8_7/inits.c
M    ruby_1_8_7/string.c
M    ruby_1_8_7/st.c
M    ruby_1_8_7/test/ruby/test_string.rb
M    ruby_1_8_7/random.c


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_1_8_7@34151 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
---
 ChangeLog                |   26 ++++++++++++++++
 inits.c                  |    4 ++
 random.c                 |   74 +++++++++++++++++++++++++++++++++++----------
 st.c                     |   14 ++++++++-
 string.c                 |    7 ++++-
 test/ruby/test_string.rb |   13 ++++++++
 version.h                |    8 ++--
 7 files changed, 123 insertions(+), 23 deletions(-)

diff --git a/inits.c b/inits.c
index 947bbbe..a0e061f 100644
--- a/inits.c
+++ b/inits.c
@@ -38,6 +38,7 @@
 void Init_sym _((void));
 void Init_process _((void));
 void Init_Random _((void));
+void Init_RandomSeed _((void));
 void Init_Range _((void));
 void Init_Regexp _((void));
 void Init_signal _((void));
@@ -46,10 +47,13 @@
 void Init_Time _((void));
 void Init_var_tables _((void));
 void Init_version _((void));
+void Init_st _((void));
 
 void
 rb_call_inits()
 {
+    Init_RandomSeed();
+    Init_st();
     Init_sym();
     Init_var_tables();
     Init_Object();
diff --git a/random.c b/random.c
index c0560e3..24a0787 100644
--- a/random.c
+++ b/random.c
@@ -189,6 +189,7 @@
 #include <fcntl.h>
 #endif
 
+static int seed_initialized = 0;
 static VALUE saved_seed = INT2FIX(0);
 
 static VALUE
@@ -250,27 +251,22 @@
     return old;
 }
 
-static VALUE
-random_seed()
+#define DEFAULT_SEED_LEN (4 * sizeof(long))
+
+static void
+fill_random_seed(ptr)
+    char *ptr;
 {
     static int n = 0;
+    unsigned long *seed;
     struct timeval tv;
     int fd;
     struct stat statbuf;
+    char *buf = (char*)ptr;
 
-    int seed_len;
-    BDIGIT *digits;
-    unsigned long *seed;
-    NEWOBJ(big, struct RBignum);
-    OBJSETUP(big, rb_cBignum, T_BIGNUM);
-
-    seed_len = 4 * sizeof(long);
-    big->sign = 1;
-    big->len = seed_len / SIZEOF_BDIGITS + 1;
-    digits = big->digits = ALLOC_N(BDIGIT, big->len);
-    seed = (unsigned long *)big->digits;
+    seed = (unsigned long *)buf;
 
-    memset(digits, 0, big->len * SIZEOF_BDIGITS);
+    memset(buf, 0, DEFAULT_SEED_LEN);
 
 #ifdef S_ISCHR
     if ((fd = open("/dev/urandom", O_RDONLY
@@ -285,7 +281,7 @@
 #endif
             )) >= 0) {
         if (fstat(fd, &statbuf) == 0 && S_ISCHR(statbuf.st_mode)) {
-            read(fd, seed, seed_len);
+            read(fd, seed, DEFAULT_SEED_LEN);
         }
         close(fd);
     }
@@ -296,13 +292,37 @@
     seed[1] ^= tv.tv_sec;
     seed[2] ^= getpid() ^ (n++ << 16);
     seed[3] ^= (unsigned long)&seed;
+}
+
+static VALUE
+make_seed_value(char *ptr)
+{
+    BDIGIT *digits;
+    NEWOBJ(big, struct RBignum);
+    OBJSETUP(big, rb_cBignum, T_BIGNUM);
+
+    RBIGNUM_SET_SIGN(big, 1);
+
+    digits = ALLOC_N(char, DEFAULT_SEED_LEN);
+    RBIGNUM(big)->digits = digits;
+    RBIGNUM(big)->len = DEFAULT_SEED_LEN / SIZEOF_BDIGITS;
+
+    MEMCPY(digits, ptr, char, DEFAULT_SEED_LEN);
 
     /* set leading-zero-guard if need. */
-    digits[big->len-1] = digits[big->len-2] <= 1 ? 1 : 0;
+    digits[RBIGNUM_LEN(big)-1] = digits[RBIGNUM_LEN(big)-2] <= 1 ? 1 : 0;
 
     return rb_big_norm((VALUE)big);
 }
 
+static VALUE
+random_seed(void)
+{
+    char buf[DEFAULT_SEED_LEN];
+    fill_random_seed(buf);
+    return make_seed_value(buf);
+}
+
 /*
  *  call-seq:
  *     srand(number=0)    => old_seed
@@ -443,6 +463,9 @@
     long val, max;
 
     rb_scan_args(argc, argv, "01", &vmax);
+    if (!seed_initialized) {
+       rand_init(random_seed());
+    }
     switch (TYPE(vmax)) {
       case T_FLOAT:
    if (RFLOAT(vmax)->value <= LONG_MAX && RFLOAT(vmax)->value >= LONG_MIN) {
@@ -490,6 +513,8 @@
     return LONG2NUM(val);
 }
 
+static char initial_seed[DEFAULT_SEED_LEN];
+
 void
 rb_reset_random_seed()
 {
@@ -497,9 +522,24 @@
 }
 
 void
+Init_RandomSeed(void)
+{
+    fill_random_seed(initial_seed);
+    init_by_array((unsigned long*)initial_seed, DEFAULT_SEED_LEN/sizeof(unsigned long));
+    seed_initialized = 1;
+}
+
+static void
+Init_RandomSeed2(void)
+{
+    saved_seed = make_seed_value(initial_seed);
+    memset(initial_seed, 0, DEFAULT_SEED_LEN);
+}
+
+void
 Init_Random()
 {
-    rand_init(random_seed());
+    Init_RandomSeed2();
     rb_define_global_function("srand", rb_f_srand, -1);
     rb_define_global_function("rand", rb_f_rand, -1);
     rb_global_variable(&saved_seed);
diff --git a/st.c b/st.c
index c16c310..21e157a 100644
--- a/st.c
+++ b/st.c
@@ -9,6 +9,7 @@
 #include <stdlib.h>
 #endif
 #include <string.h>
+#include <limits.h>
 #include "st.h"
 
 typedef struct st_table_entry st_table_entry;
@@ -521,6 +522,8 @@ struct st_table_entry {
     return 0;
 }
 
+static unsigned long hash_seed = 0;
+
 static int
 strhash(string)
     register const char *string;
@@ -550,10 +553,11 @@ struct st_table_entry {
 
     return val + (val << 15);
 #else
-    register int val = 0;
+    register unsigned long val = hash_seed;
 
     while ((c = *string++) != '\0') {
    val = val*997 + c;
+   val = (val << 13) | (val >> (sizeof(st_data_t) * CHAR_BIT - 13));
     }
 
     return val + (val>>5);
@@ -573,3 +577,11 @@ struct st_table_entry {
 {
     return n;
 }
+
+extern unsigned long rb_genrand_int32(void);
+
+void
+Init_st(void)
+{
+    hash_seed = rb_genrand_int32();
+}
diff --git a/string.c b/string.c
index c6b2301..94a0281 100644
--- a/string.c
+++ b/string.c
@@ -875,13 +875,15 @@
     return str1;
 }
 
+static unsigned long hash_seed;
+
 int
 rb_str_hash(str)
     VALUE str;
 {
     register long len = RSTRING(str)->len;
     register char *p = RSTRING(str)->ptr;
-    register int key = 0;
+    register unsigned long key = hash_seed;
 
 #if defined(HASH_ELFHASH)
     register unsigned int g;
@@ -905,6 +907,7 @@
     while (len--) {
    key = key*65599 + *p;
    p++;
+   key = (key << 13) | (key >> ((sizeof(unsigned long) * CHAR_BIT) - 13));
     }
     key = key + (key>>5);
 #endif
@@ -5062,4 +5065,6 @@ struct tr {
     rb_fs = Qnil;
     rb_define_variable("$;", &rb_fs);
     rb_define_variable("$-F", &rb_fs);
+
+    hash_seed = rb_genrand_int32();
 }
diff --git a/test/ruby/test_string.rb b/test/ruby/test_string.rb
index 5f2c54f..4d97182 100644
--- a/test/ruby/test_string.rb
+++ b/test/ruby/test_string.rb
@@ -1,4 +1,5 @@
 require 'test/unit'
+require File.expand_path('envutil', File.dirname(__FILE__))
 
 class TestString < Test::Unit::TestCase
   def check_sum(str, bits=16)
@@ -29,4 +30,16 @@ def test_inspect
   ensure
     $KCODE = original_kcode
   end
+
+  def test_hash_random
+    str = 'abc'
+    a = [str.hash.to_s]
+    cmd = sprintf("%s -e 'print %s.hash'", EnvUtil.rubybin, str.dump)
+    3.times {
+      IO.popen(cmd, "rb") {|o|
+        a << o.read
+      }
+    }
+    assert_not_equal([str.hash.to_s], a.uniq)
+  end
 end

--- a/version.c 2011-12-19 03:22:43.000000000 +0000
+++ b/version.c 2011-12-29 18:18:58.000000000 +0000
@@ -46,7 +46,7 @@
     rb_define_global_const("RUBY_PATCHLEVEL", INT2FIX(RUBY_PATCHLEVEL));
 
     snprintf(description, sizeof(description),
-             "ruby %s (%s %s %d) [%s], MBARI 0x%x, Ruby Enterprise Edition %s",
+             "ruby %s (%s %s %d) [%s], MBARI 0x%x, Ruby Enterprise Edition %s (with hash random)",
              RUBY_VERSION, RUBY_RELEASE_DATE, RUBY_RELEASE_STR,
              RUBY_RELEASE_NUM, RUBY_PLATFORM,
              STACK_WIPE_SITES, REE_VERSION);
--
1.7.5.4

You can view it or download it from github.

Disclaimer: This is provided as is, no guarantees are provided, etc.



PlanetMySQL Voting: Vote UP / Vote DOWN

Oracle "Technologist of the Year: Developer" Award

Декабрь 21st, 2011

I am honored to receive Oracle's Technologist of the Year: Developer award, formerly Oracle Magazine Editors’ Choice Awards.

Technologist of the Year Award is given for individuals for their technical achievements with regard to Oracle products.

As opposed to community based awards, to win this award one must be nominated by himself or his company. There are several award categories: Developer, DBA, IT Manager etc., and many nominations per category. I have been nominated by my company and am happy to have won the award in the Developer category.

Allow me to take the pleasant opportunity to make some acknowledgements.

makam

Most people know me as a MySQL consultant/instructor/speaker/something. I wear several hats; in one I am co-founder and CTO of SFNK, an Israeli company providing the makam system, a specialized service providing valuable insight on our customers increasing need: the need to know what people say about them on the internet. Our product is running for 7 years now, is serving large amounts of data, and is operating several MySQL server nodes.

I've been working with MySQL for over 11 years, but it is throughout my work in makam that I have come to look deeply into query & server optimization, backups, scale out, design etc. We had some demanding features that pushed me into developing scripts and solutions for MySQL, released as open source. Throughout my experience as a DBA (I'm originally a developer) I came to appreciate the MySQL product, on one had, and find usability fallacies, on the other. The fact that our product is DB centric put me in the position to learn as much as I could about SQL.

I thank makam for this. This product has been the test bed for many of my experiments. Thankfully I never erased the database!

The MySQL community

Though I just recently wrote this, I'm happy to acknowledge it again.

Since my work is based off open source products, open source projects, free advice, professional blogs, I have gained much from the open source world in general, and from the MySQL community in particular, that made me want to share some back.

Its wonderful to have people read and comment on my blog posts and ideas; download, use and provide feedback on code I write; attend an occasional talk. All this pushes me in my work. I honestly hope to have provided with valuable content, code or assistance, to just anyone. Anyone and someone who would benefit from the richness of open source and knowledge sharing.

I am humbled by the advantage I got from the community. I have found names behind the community's first anonymous appearance. I've found people who are easy to reach, happy to respond, generous in their advice, experts in their field.

Thank you.

Oracle & MySQL

An observation on this award is that it is given to a MySQL technologist. I have never worked with Oracle (as in the RDBMS) technologies, and my entire nomination is based on my experience with MySQL.

I see significance in this, and believe it reaffirms the focus and attention MySQL gets from Oracle. It makes me happy as a MySQL user and as a MySQL community member.

I thank Oracle for giving me this honor.


PlanetMySQL Voting: Vote UP / Vote DOWN

05.12. Doctrine 2

Декабрь 5th, 2011

Introduction

Object-relational mapping (ORM) frameworks have been around for several years now and for some people, ORM is already outdated by now. As we have seen with other technologies and concepts before, PHP is not exactly what we call an early adopter among the programming languages. Thus it took some time for ORM to grow up in the PHP context.

There have been some frameworks before Doctrine 2 that implement ORM (remember e.g. Propel) specific tasks but most of them lack the required maturity to be used in large projects. With Doctrine 2, PHP takes a huge step into the right direction – Doctrine 2 is fast, extensible and easy to use.

This article will take you on a tour through the main concepts of Doctrine 2 in the first part and then explain how to use it in a real world application in the second part. Since at the time of writing Zend Framework 1.11.xx (ZF) is very popular, we will integrate Doctrine 2 into a ZF project.


Basic Concepts

To understand Doctrine 2, we have to take a look at some relevant terms (or in this case objects), study their behavior and practice their usage. We start with some introductory phrases on ORM systems and then go on to the concepts underlying Doctrine 2: Entity Objects, the Entity Manager, Repositories and Proxies.


Object-relational Mapping

Since the beginning of Object-Orientation, people had to manage the persistence of their application's state resp. their objects. In the context of Web Application Development, this usually involves a Database server which is being consulted using a Query Language. One example for this pattern is a PHP application that uses some kind of SQL server by sending SQL queries to it. Another one is an application using a CouchDB server by querying it via its REST API.
Due to the author's laziness, we will talk in terms of relational databases from now on. Keep in mind, that you can accomplish almost everything mentioned here with NoSQL databases, too.

ORM relates value objects that exist in an application's business logic to database records. Thus every object that should be persistent is saved in one row of a database table. The most common approach is to map classes to tables and the classes' objects to rows in the these tables.
Besides writing objects to a database, ORM systems are also intended to ease the process of finding data stored in the database. When talking in terms of ORM, finding data always means making the framework fetch one or many objects that meet a certain criteria.


Entity Objects

The objects that are being managed by an ORM system are called Entity Objects. Every entity object relates to one entry in a table. In Doctrine 2, the classes that represent entities do not have to fulfill special requirements like inheriting from a certain super class (as you might have seen in other database abstraction frameworks like Zend_Db). When creating a new entity class with Doctrine 2, all you have to do is to write down a regular PHP class with properties. Besides this, you have to provide some hints on how these attributes should be persisted. The information how entity attributes relate to columns in the DB is called Metadata. Metadata can be described in different ways: By default there are metadata drivers for descriptions in XML, YAML and PHP. The fourth and most popular driver is based on DocBlock annotations (since in PHP, annotations aren't a language feature as in Java (see Wikipedia), they are contained by the classes' and attributes' DocBlocks). We will use annotations to describe our entities metadata. To get an impression on how easy this is, take a look at the following example.



This example contains all it needs to tell Doctrine 2 about the new entity User. With this class, you can create, find, delete and modify user objects and persist their state to the underlying database. But keep in mind: as long as you don't need any persistence features, you can use your user objects just like any other objects!

The next two objects resp. object types we will describe are responsible for doing the ORM functionality: persisting and finding.


The Entity Manager

To use ORM functionality, the Entity Manager (Doctrine\ORM\EntityManager) is the main access point to Doctrine 2. The entity manager is responsible – as you might have guessed – for managing entities and for building a facade for the whole framework. To accomplish its tasks, the entity manager uses some helpers. The Unit of Work object for example collects entities that should be written back to the database and is capable of doing this in batches. This way, database operation can be executed with almost no overhead and therefore are really fast.

Another dependency of the entity manager is the Event Manager. To be as extensible as possible, Doctrine 2 comes with an event system that publishes all important state changes to the outside as events. You can register for such events and extend the life cycle of your entity objects at one single point.

The entity manager's API combines methods for managing entities (find, persist, contains, copy, detatch, merge, remove and refresh), methods that control the use of transations (beginTransaction, commit, flush, rollback and transactional) and some helper methods for creating custom queries and accessing some of the entity manager's dependencies.

The following example shows how to query an object from the entity manager, modify it and write the changes back into the database.



Creating a new persistent object is almost as easy as modifying it:




Repositories

For finding entities, Repositories are used. Every entity class has its own repository which is responsible for finding entities of that type. By default, repositories have some handy methods for fetching entities that match certain criteria:

  • find: Finds an entity by its primary key / identifier
  • findAll: Finds all entities of the repository's entity type
  • findBy / findOneBy: Finds all resp. one entity that matches the passed criteria:
  • findBy<attribute> / findOneBy<attribute>: Magic methods that ease the filtering by a single attribute:

To access a repository, all you have to do is ask the entity manager for one. If you have implemented your own repository, it will be returned by Doctrine\ORM\EntityManager::getRepository(). Otherwise, Doctrine 2 will provide a generic repository. The main reason to implement custom repository classes is to group custom queries for an entity type to make them reusable. For custom query logic, there are several mechanisms you can use: You can either use Doctrine's query builder that implements an API similar to Zend_Db_Select or queries written in the Doctrine Query Language (DQL) or you can even execute plain SQL queries. With these options, it is also possible to migrate old applications which use complex queries by just wrapping these queries into the methods of custom repositories.


Proxies

When traversing a graph of entity objects (which is required when entities are having relations to other entities), it would be very expensive (in the sense of “requiring many database queries”) to fetch every depending entity with an additional query. Therefore Doctrine 2 uses the concept of Proxy objects that represent regular entity objects which have not been populated with data from the database. Take a look at the following example where the entity Group aggregates a list of User objects in its member property. When accessing the members list, Doctrine 2 provides a collection of proxy objects instead of complete User objects. When an object of this collection is being asked for one of its properties, Doctrine loads the object's data from the database. This way, the users' data is not loaded until it is really needed.




Advanced Mapping Concepts

This section describes some advanced concepts that are required when mapping entity classes that have relationships to other entity classes. Possible relationship types are association and inheritance. Inheritance is the mechanism used for representing subtypes in object-oriented programming languages. An example would be a class User that implements methods every user of a software should have and a class Administrator inheriting from User that adds methods for determining the administrator's access rights.

Association is a weaker relation type. It means that an entity object can be related to other entity objects of other types. In terms of relational databases, there are three types of association which differ in the number of entities an object is related to: 1:1, 1:n and n:m relationships. n and m are placeholders and mean multiple.


Association

To put objects of an entity type into relation, you just have to mention this relation in the entity class' mapping information. The simplest case is a unidirectional 1:1 relationship. In the following example we describe a User entity which has its access information (user credentials) encapsulated into another entity class called UserCredential. Since every user has at most one credential object and every credential object may only be associated to one user object, this is a 1:1 relationship.



If the relationship should be bidirectional, include the OneToOne attribute in the other class, too, and add an attribute which denotes the attribute of the other entity that mapps the related object:



This way, you can access the user object from the credentials object, too.
Most of the times, developers have to deal with relationships which include many objects on at least one side. These relationships are called 1:n or n:m relationships. This means that either one or multiple entities are standing in relationship with an arbitrary number of entities of another type. To accomplish this, you have to use the mapping keywords OneToMany or ManyToMany when describing your entities. Besides that, the mapping works the exact same way as with 1:1 relationships.

There are however some tricks you should know when dealing with collections of associated entity objects. Consider the following relationship between the entity classes User and Group:



When a group has at least one member, the group object will have a collection of the type Doctrine\Common\Collections\ArrayCollection set as its members property. This collection contains all user objects (or proxy objects as we have seen before) and can be modified intuitively with the methods add and removeElement. To honor object-orientation, you might want to introduce custom methods for these tasks. If you do so, you get into trouble when the group object does not have any users associated. In this case, the collection will simply be set to null. To avoid checks whether the collection has already be initialized, you should to this by yourself in the entity class' constructor:



It is also important to notice that one entity has to update the other entity's state as well when a relationship between to objects is created or removed. Take care to do this only in one class to avoid endless recursion loops! This class is called the Owning Side of the relationship. When implementing a bidirectional relationship, the other class is called the Inverse Side. It is important to determine owning and inverse side and implement the the classes accordingly to avoid greater trouble during debugging.

There are some more features implemented by Doctrine 2 enabling developers to specify their entities' relationships including sorting, pre-fetching and indexing. These topics are not covered in this article but are explained very understandable in the Doctrine 2 documentation.


Inheritance

Subtyping can be implemented in different ways using Doctrine 2. The main difference between these implementations is how the inheritance is mapped to the database. The options are to have one table for every class (Class Table Inheritance), to have one table for all classes in a hierarchy (Single Table Inheritance) and to have a table for every specialized sub-class of a given super-class (Mapped Super Class).We will give a short overview on all three alternatives, you have to pick the right one yourself. This decision should be made based on how many common attributes there are in your sub-classes.

Mapped Superclasses

Introducing a mapped superclass is probably the easiest way for specifying inheritance but might lead to many duplicate columns in your database schema. The superclass of your entities is not being declared as an entity itself (and might also be declared abstract) but provides attributes and optionally methods that will be available in all subclasses. When creating the database schema, Doctrine 2 merges all attributes and relationships of the superclass into the definitions of the subclasses and processes them as regular entities.



After creating the database from this mapping information, your tables will look like this:

Single Table Inheritance

When having entities that are very similar besides some few attributes, you might want to store them together in one database table. This approach is called Single Table Inheritance. To distinguish between the different types, there is always a column marked as discriminator column and a discriminator map that tells Doctrine 2 which values in the discriminator indicate what entity types.



These definitions cause the existing of one single table called User with all the attributes declared inside the classes User and Administrator plus a column type – the discriminator column. When working with entities of these types, Doctrine will manage the type flag automatically for you.

The resulting database schema looks as illustrated by the following diagram:

Class Table Inheritance

Having each entity type stored in its own table is always good for keeping your schema extensible. When you have to create a new subtype, Doctrine 2 will just create a new table for this type and it can inherit the logic and common attributes of a superclass. The only overhead you have with this approach is that all tables that correspond to subtypes have to maintain a relationship to their supertype's table. Using class table inheritance, the example with the entities User and Administrator looks like this:



Besides the inheritance type, there is no difference to the example using single table inheritance. The outcome on the resulting database scheme is huge. Now you have to separate tables which store users and administrators. Every record in the table Administrator has a corresponding record in the User table.


This was the first part of this article. Stay tuned for part II which will be published tomorrow (on 6th of December 2011)! In the second part, we will integrate Doctrine 2 into a Zend Framework application and include a generic sandbox (ZF-)project with Doctrine 2!


PlanetMySQL Voting: Vote UP / Vote DOWN

More MySQL foreach()

Декабрь 2nd, 2011

In my previous post I've shown several generic use cases for foreach(), a new scripting functionality introduced in common_schema.

In this part I present DBA's handy syntax for schema and table operations and maintenance.

Confession: while I love INFORMATION_SCHEMA's power, I just hate writing queries against it. It's just so much typing! Just getting the list of tables in a schema makes for this heavy duty query:

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='sakila' AND TABLE_TYPE='BASE TABLE';

When a join is involved this really becomes a nightmare. I think it's cumbersome, and as result, many do not remember the names and meaning of columns, making for "oh, I need to read the manual all over again just to get that query right". Anyway, that's my opinion.

A SHOW TABLES statement is easier to type, but cannot be integrated into a SELECT query (though we have a partial solution for that, too), and besides, when filtering out the views, the SHOW statement becomes almost as cumbersome as the one on INFORMATION_SCHEMA.

Which is why foreach() offers handy shortcuts to common iterations on schemata and tables, as follows:

Use case: iterate all databases

call foreach('schema', 'CREATE TABLE ${schema}.event(event_id INT, msg VARCHAR(128))');

In the above we execute a query on each database. Hmmm, maybe not such a good idea to perform this operation on all databases? Let's filter them:

Use case: iterate databases by name match

call foreach('schema like wordpress_%', 'ALTER TABLE ${schema}.wp_posts MODIFY COLUMN comment_author VARCHAR(96) NOT NULL');

The above will only iterate my WordPress databases (I have several of these), performing an ALTER on wp_posts for each of those databases.

I don't have to quote the like expression, but I can, if I wish to.

I can also use a regular expression match:

call foreach('schema ~ /^wordpress_[0-9]+$/', 'ALTER TABLE ${schema}.wp_posts MODIFY COLUMN comment_author VARCHAR(96) NOT NULL');

Use case: iterate tables in a specific schema

Time to upgrade our sakila tables to InnoDB's compressed format. We use $(), a synonym for foreach().

call $('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPRESSED');

The above will iterate on tables in sakila. I say tables, since it will avoid iterating views (there is still no specific syntax for views iteration). This is done on purpose, as my experience shows there is very little in common between tables and views when it comes to maintenance and operations.

Use case: iterate tables by name match

Here's a interesting scenario: you wish to work on all tables matching some name. The naive approach would be to:

SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'wp_posts' AND TABLE_TYPE = 'BASE TABLE'

Wait! Are you aware this may bring your server down? This query will open all databases at once, opening all .frm files (though thankfully not data files, since we only check for name and type).

Here's a better approach:

call foreach('table like wp_posts', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB');

(There's now FULLTEXT to InnoDB, so the above can make sense in the near future!)

The good part is that foreach() will look for matching tables one database at a time. It will iterate the list of database, then look for matching tables per database, thereby optimizing the query on INFORMATION_SCHEMA.

Here, too, I can use regular expressions:

call $('table ~ /^wp_.*$/', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB');

Conclusion

This is work in the making, but, as someone who maintains a few productions servers, I've already put it to work.

I'm hoping the syntax is easy to comprehend. I know that since I developed it it must be far more intuitive to myself than to others. I've tried to keep close on common syntax and concepts from various programming languages.

I would like to get as much feedback as possible. I have further ideas and thoughts on the direction common_schema is taking, but wish take it in small steps. Your feedback is appreciated!


PlanetMySQL Voting: Vote UP / Vote DOWN

common_schema, rev. 178: foreach(), repeat_exec(), Roland Bouman, query analysis

Декабрь 1st, 2011

common_schema, revision 178 is now released, with major additions. This revision turns common_schema into a framework, rather than a set of views and functions.

common_schema provides with query scripting, analysis & informational views, and a function library, allowing for easier administration and diagnostics for MySQL. It introduces SQL based tools which simplify otherwise complex shell and client scripts, allowing the DBA to be independent of operating system, installed packages and dependencies.

There's no Perl nor Python, and no dependencies to install. It's just a schema.

Some highlights for the new revision:

  • foreach(), aka $(): loop through a collection, execute callback commands per element.
  • repeat_exec(): a repeat-until device: execute queries until some condition holds.
  • exec_file(): execute files a-la SOURCE, but on server side
  • Query analysis: analyze query text, view or routine definitions to detect dependency objects.
  • Improvements to views and routines, new routines introduced.

Let's take a closer look:

rpbouman

I'm very happy to have Roland Bouman working on this project. He introduced some sophisticated code without which some functionality could not take place. I'm sure I don't need to introduce his great capabilities; I'll just pass the note that it is very good working with him!

foreach()

Introducing a looping device which can iterate a collection and execute callback commands.

What's a collection? A range of numbers; a set of constants; the result set of a SELECT query; tables in your database and more.

What is a callback? A query or set of queries to invoke on the specific elements in the collection. For example:

call foreach('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPRESSED');

I'll publish dedicated posts on foreach(), aka $(), following this post. Official documentation is here.

repeat_exec()

Repeat executing queries in a given interval, until some condition holds.

What kind of condition? You can loop forever, or until a given time has passed, a given number of iteration has passed.

You can iterate until no rows are affected by your commands (your callbacks), or until some dynamic condition holds (a query evaluates to true).

For example: purge rows from a table until no more rows are affected; in interval of 3 second:

call repeat_exec(3, 'DELETE FROM test.event WHERE ts < CURDATE() ORDER BY id LIMIT 1000', 0);

Official documentation is here.

exec_file()

If you need to execute commands from a file, you usually invoke SOURCE:

mysql> SOURCE '/tmp/somefile.sql';

Or you invoke mysql client and redirect its input to read from file:

bash$ mysql some_db < /tmp/somefile.sql

SOURCE is a MySQL client command. The file must reside on your client. Running the mysql client is great, but you need to work it out from outside the server.

call_exec() will let you import a file on server side, from within the server:

call exec_file('/tmp/some_file.sql');

You will need to have the file readable; it is limited to 64K at this moment; it may not use DELIMITER, and it may not include dynamic SQL. These are the limitations.

Official documentation is here.

exec() / exec_single()

All of the above rely on the exec() / exec_single() routines, which dynamically execute a set of queries. One one hand, it's no big deal: they only have to use prepared statements in order to invoke the queries. But then, they knows how to parse multiple queries (find the ";" delimiter correctly), plus they allow for configuration: if you set @common_schema_dryrun, queries are not actually executes; just printed out. If you set @common_schema_verbose, queries are verbosed in addition to being executed. Since all execution routines rely on these,we get a standardized pattern.

Official documentation is here.

Query analysis

Query parsing routines allow for detection of dependencies within queries. While not full-blown SQL parser, these allow one to realize on which tables or routines a view depends on; or a routines depends on; or an event; or just any query.

These routines can analyze the text of not only a SELECT query, but also UPDATE, DELETE, CREATE, etc. They can read the code of a stored routines, including queries and control flow constructs; thus, they are also able to analyze events and triggers.

At this stage forward-dependencies resolution is supported. This can eventually lead to dependency graphs or to reverse-dependency resolution (i.e. "which view, routine, trigger or event depend on table t?")

Example:

mysql> call get_view_dependencies('sakila', 'actor_info');
+-------------+---------------+-------------+--------+
| schema_name | object_name   | object_type | action |
+-------------+---------------+-------------+--------+
| sakila      | actor         | table       | select |
| sakila      | category      | table       | select |
| sakila      | film          | table       | select |
| sakila      | film_actor    | table       | select |
| sakila      | film_category | table       | select |
+-------------+---------------+-------------+--------+

The query analysis routines are in BETA stage.

Official documentation is here.

Test quite

common_schema is now tested. Not all code is as yet under tests; all new code is, and some of the older code. Work is in progress to add more and more tests.

Further changes:

  • candidate_keys does not give higher score for PRIMARY KEYs any longer. It ranks all unique keys according to its own heuristic; it also provides with the  is_primary and is_nullable columns.
  • Added candidate_keys_recommended view, recommending best candidate key per table (while noting whether it qualifies as PRIMARY KEY in terms of NULLable columns).
  • Added many text parsing and text manipulation routines, such as better trim, tokenizing, etc. Improved existing code significantly.

Get it

common_schema is available for downloaded. It is released under the BSD license, and is free.

I've put very hard work into common_schema's documentation. It is very thorough and provides with clear examples. The documentation is also available for download.

If you encounter problems, please report on the issues page.

common_schema is meant to be downloaded & installed on any MySQL server. It provides with general and essential functionality. Spread the word!


PlanetMySQL Voting: Vote UP / Vote DOWN

Writing a MariaDB PAM Authentication Plugin

Ноябрь 17th, 2011

As you know, since version 5.2.0 (released in April 2010) we support Pluggable Authentication. Using this feature one can implement an arbitrary user authentication and account managing policy, completely replacing built-in MariaDB authentication with its username/password combination and mysql.user table.

Also, as you might have heard, Oracle has recently released a PAM authentication plugin for MySQL. Alas, this plugin will not run on MariaDB — although MySQL implementation of pluggable authentication is based on ours, the API is incompatible. And, being closed source, this plugin cannot be fixed to run in MariaDB. And — I’m not making it up — this plugin does not support communication between the client and the server, so even with this plugin and all the power of PAM the only possible authentication method remains username/password combination.

But writing authentication plugins is easy, I said to myself. I will do my own authentication plugin! With blackjack and hookers.

I start from installing the development headers:

sudo rpm -ivh MariaDB-devel-5.2.9-102.el5.x86_64.rpm

On Debian or Ubuntu you would’ve need to install libmariadbclient-dev. By the way, a disclaimer — I’m doing it for MariaDB-5.2, but with minimal changes this plugin can work with MySQL-5.5 too.

Now I create a working directory and, being a lazy guy, copy auth_socket plugin  sources — one of authentication plugins that come with MariaDB — from Launchpad (only auth_socket.c). Stripped down, with the old code removed, it becomes my pam.c:

#define MYSQL_DYNAMIC_PLUGIN
#include <mysql/plugin_auth.h>

static int pam_auth(MYSQL_PLUGIN_VIO *vio, MYSQL_SERVER_AUTH_INFO *info)
{
}

static struct st_mysql_auth pam_info =
{
  MYSQL_AUTHENTICATION_INTERFACE_VERSION,
  "dialog",
  pam_auth
};

mysql_declare_plugin(pam)
{
  MYSQL_AUTHENTICATION_PLUGIN,
  &pam_info,
  "pam",
  "Sergei Golubchik",
  "PAM based authentication",
  PLUGIN_LICENSE_GPL,
  NULL,
  NULL,
  0x0100,
  NULL,
  NULL,
  NULL
}
mysql_declare_plugin_end;

At the end of the file we have the plugin descriptor — it always has the same structure for all plugin types. Above it — authentication plugin descriptor, it tells MariaDB what function performs the actual authentication, and what plugin should the client use.

Let me repeat — what plugin should the client use. Indeed, an authentication process is always a dialog. The server asks questions (“username?”, “password?”), the client answers them. Because a loadable plugin may cause the server to ask the most unexpected question (“the fingerprint of the left index finger?”), the client should support plugins too — that know how to answer them. And it does support them — or, more precisely, libmysqlclient does, automatically and transparently for the client applications.

In this particular case, though, the questions aren’t very exotic. PAM may only ask the end user to enter some text, so the client plugin needs to be able to print a prompt text, read the user’s input, and send it back to the server. And repeat until the server is satisfied. Luckily, MariaDB already has a plugin to perform such a dialog with the user. The plugin is called, not surprisingly, dialog, and in the my plugin descriptor, I specify that pam server plugin needs client to load dialog plugin to be able to continue the authentication.

Now, let’s see if this plugin skeleton works:

$ gcc -o pam.so pam.c `mysql_config --cflags` -shared -fPIC -lpam

It compiles and even loads into the server. Looks good so far, and I open man pam.

According to the man pages, to perform a PAM authentication one needs to do the following:

  1. initialize the PAM subsystem with the pam_start() function.
  2. invoke pam_authenticate() that performs the actual authentication
  3. verify user’s account with the pam_acct_mgmt()
  4. in the process of authentication, PAM can change the user name. Retrieve the new name with pam_get_item(PAM_USER)
  5. at the end one should always call pam_end()

To talk to the client, PAM allows to specify a conversation function — my function that PAM will invoke as necessary.

I put the above logic into the main pam_auth() function:

#include <string.h>
#include <security/pam_modules.h>
#include <security/pam_appl.h>

static int conv(int n, const struct pam_message **msg,
                struct pam_response **resp, void *data)
{
}

#define DO(X) if ((status = (X)) != PAM_SUCCESS) goto end

static int pam_auth(MYSQL_PLUGIN_VIO *vio, MYSQL_SERVER_AUTH_INFO *info)
{
  pam_handle_t *pamh = NULL;
  int status;
  const char *new_username;
  struct param param;
  struct pam_conv c = { &conv, &param };

  /* get the service name, as specified in

     CREATE USER ... IDENTIFIED WITH pam_auth AS  "service"

  */
  const char *service = info->auth_string ? info->auth_string : "mysql";

  param.ptr = param.buf + 1;
  param.vio = vio;

  DO( pam_start(service, info->user_name, &c, &pamh) );
  DO( pam_authenticate (pamh, 0) );
  DO( pam_acct_mgmt(pamh, 0) );
  DO( pam_get_item(pamh, PAM_USER, (const void**)&new_username) );

  if (new_username)
    strncpy(info->authenticated_as, new_username,
            sizeof(info->authenticated_as));

end:
  pam_end(pamh, status);
  return status == PAM_SUCCESS ? CR_OK : CR_ERROR;
}

The plugin is almost done. The only missing bit is the conversation function conv(). According to the PAM documentation, it will be invoked with an array of “questions”, that should be shown to the user, and it must return his answers. Additionally, it will get one opaque pointer argument — callback functions almost always have it in almost all APIs in the world. From this function I will send the “questions” to the client, and receive the answers. The dialog plugin on the client side will do the actual communication with the user.

Sending and receiving is easy in Pluggable Authentication API. One of the arguments of the main authentication function — pam_auth() in our case — is a so called vio handle. This handle provides read_packet() and write_packet() functions, that the client and server plugins can use to communicate with each other. The server will take care of everything else — delivering packets, splitting and reassembling them, encrypting (if SSL is used), using unix sockets, tcp/ip, named pipes, shared memory, making sure that the server plugin talks to a right client plugin, maintaining backward compatible protocol on the wire, and so on. That’s, by the way, where the name vio comes from — it means Virtual I/O.

There is one last difficulty to overcome. PAM can four different types of messages, two of them being purely informational, with the meaning “print this to user”, and two being input messages, with the meaning “print this and read the reply”. While dialog plugin supports only “print this and read the reply” kind of actions. To solve this API mismatch, our conversation function will accumulate PAM informational messages until it sees the first input message. Then it’ll send all accumulated and concatenated messages to the dialog plugin as one big prompt string, in one packet. This is what I mean:

struct param {
  unsigned char buf[10240], *ptr;
  MYSQL_PLUGIN_VIO *vio;
};

static int conv(int n, const struct pam_message **msg,
                struct pam_response **resp, void *data)
{
  struct param *param = (struct param *)data;
  unsigned char *end = param->buf + sizeof(param->buf) - 1;
  int i;

  for (i = 0; i < n; i++) {
    /* if there's a message - append it to the buffer */
    if (msg[i]->msg) {
      int len = strlen(msg[i]->msg);
      if (len > end - param->ptr)
        len = end - param->ptr;
      memcpy(param->ptr, msg[i]->msg, len);
      param->ptr+= len;
      *(param->ptr)++ = '\n';
    }
    /* if the message style is *_PROMPT_*, meaning PAM asks a question,
       send the accumulated text to the client, read the reply */
    if (msg[i]->msg_style == PAM_PROMPT_ECHO_OFF ||
        msg[i]->msg_style == PAM_PROMPT_ECHO_ON) {
      int pkt_len;
      unsigned char *pkt;

      /* allocate the response array.
         freeing it is the responsibility of the caller */
      if (*resp == 0) {
        *resp = calloc(sizeof(struct pam_response), n);
        if (*resp == 0)
          return PAM_BUF_ERR;
      }

      /* dialog plugin interprets the first byte of the packet
         as the magic number.
           2 means "read the input with the echo enabled"
           4 means "password-like input, echo disabled"
         C'est la vie. */
      param->buf[0] = msg[i]->msg_style == PAM_PROMPT_ECHO_ON ? 2 : 4;
      if (param->vio->write_packet(param->vio, param->buf, param->ptr - param->buf - 1))
        return PAM_CONV_ERR;

      pkt_len = param->vio->read_packet(param->vio, &pkt);
      if (pkt_len < 0)
        return PAM_CONV_ERR;
      /* allocate and copy the reply to the response array */
      (*resp)[i].resp = strndup((char*)pkt, pkt_len);
      param->ptr = param->buf + 1;
    }
  }
  return PAM_SUCCESS;
}

That’s all. Now I can compile it as above (repeating the process twice, because I forgot -lpam the first time), load it, configure PAM to use pam_skey for the “mysql” service, create the user and, finally, login:

$ mysql -u root
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 1
Server version: 5.2.9-MariaDB-debug Source distribution

MariaDB []> CREATE USER serg IDENTIFIED VIA pam USING 'mysql';
Query OK, 0 rows affected (0.00 sec)

MariaDB []> ^DBye
$ mysql -u serg
challenge otp-md5 99 th91334
password: <Enter>
(turning echo on)
pasword: OMEN US HORN OMIT BACK AHOY
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 2
Server version: 5.2.9-MariaDB-debug Source distribution

MariaDB []> SELECT "Hey-ho! It works!!!


PlanetMySQL Voting: Vote UP / Vote DOWN

Bzr and launchpad tricks: firefox plugin

Ноябрь 7th, 2011

If you work with bazaar, you have seen its URIs. You can find the complete list is in the bzr help urlspec. Although I commonly use only a subset of that, like bzr+ssh://bazaar.launchpad.net/~maria-captains/maria/5.2-serg/ and http://bazaar.launchpad.net/%2Bbranch/mysql-server/5.5/.

In addition I often use Launchpad aliases, such as lp:~maria-captains/maria/5.3-serg/, lp:maria/5.3, and lp:869001.

And finally, there are common abbreviations that we have used in MySQL, and others that we use in MariaDB, for example bug#12345 and wl#90.

What’s annoying, I need to remember that wl#90 corresponds to http://askmonty.org/worklog/?tid=90 and type the latter in the location bar of the browser, when I want to look this task up. And lp:869001 is, for my browser, https://bugs.launchpad.net/bugs/869001. Similarly, every other URL above, has its browser-friendly evil twin. It’s evil, because I have to remember it!

Now, Firefox tries to help, to a certain extent. It supports so-called keywords — short aliases for bookmarks. Create a bookmark for https://bugs.launchpad.net/bugs/%s and in the Keyword field enter lp. Now you can type in the location bar lp 869001 (with a space) and Firefox will expand it into a complete url https://bugs.launchpad.net/bugs/869001. Quite handy. And I’ve used it for a few years. Still it annoyed me, that I had to rewrite the abbreviations manually, put spaces, remove colons, and so on. And at last it annoyed me to a degree where I wrote a Firefox plugin!

Let me introduce a LocationMorph — a plugin that can arbitrarily rewrite the text in the location bar, according to the user specified regular expression. I have configured it (via a Preference dialog) to use the following set of rules

^bzr\+ssh://bazaar.launchpad.net/ http://code.launchpad.net/
^lp:(\d+)$ https://bugs.launchpad.net/bugs/$1
^lp: http://code.launchpad.net/
^wl#(\d+)$ http://askmonty.org/worklog/?tid=$1
^bug#(\d+)$ http://bugs.mysql.com/bug.php?id=$1

And now I can simply copy and paste a bzr URI, or a launchpad alias, or our internal abbreviation from the email (or bzr info) directly into the browser, and it understands it directly and shows me the page that I want. Ahh, perfection…


PlanetMySQL Voting: Vote UP / Vote DOWN