Archive for the ‘Development’ Category

sphinx, sphinx_snippets() & MySQL 5.5

Март 21st, 2012

I've written a patch which completes Sphinx's integration with MySQL 5.5.

Up until a couple months ago, Sphinx would not compile with MySQL 5.5 at all. This is, thankfully, resolved as of Sphinx 2.0.3.

However, to my disdain, I've found out that it only partially work: the sphinx_snippets() user defined function is not included within the plugin library. After some quick poking I discovered that it was not added to the build, and when added, would not compile.

I rely on sphinx_snippets() quite a lot, and like it. Eventually I wrote the fix to the snippets_udf.cc which allows it to run in a MySQL 5.5 server.

Here are the changes for the 2.0.4 version of Sphinx:

Replace your 2.0.4 files with these two and get on compiling your MySQL server.

Compilation guide

For completeness, here's how to compile Percona Server 5.5 with Sphinx 2.0.4 including the above patches:

Get Percona Server source code and Sphinx Search source code.

I'll be using Percona Server 5.5.21-25.0. I use /data/tmp/mysql as compilation path, and install MySQL on /usr/local/mysql55.

mkdir -p /data/tmp/mysql
cd /data/tmp/mysql
tar xzfv Percona-Server-5.5.21-rel25.0.tar.gz
tar xzfv sphinx-2.0.4-release.tar.gz
cd Percona-Server-5.5.21-rel25.0/
cp -R /data/tmp/mysql/sphinx-2.0.4-release/mysqlse storage/sphinx

Overwrite with patched files included in this post:

cp /tmp/CMakeLists.txt storage/sphinx/CMakeLists.txt
cp /tmp/snippets_udf.cc storage/sphinx/snippets_udf.cc

Build MySQL:

sh BUILD/autorun.sh
./configure --with-plugin-sphinx --prefix=/usr/local/mysql55
make
sudo make install

Install the mysql55 service:

cd /usr/local/mysql55
sudo cp support-files/mysql.server /etc/init.d/mysql55

In /etc/bash.bashrc, add:

export PATH=/usr/local/mysql55/bin:${PATH}

Start MySQL:

sudo service mysql55 start

Login to MySQL as an administrato (typically root) and install Sphinx:

mysql> INSTALL PLUGIN sphinx SONAME 'ha_sphinx.so';
mysql> CREATE FUNCTION sphinx_snippets RETURNS STRING SONAME 'ha_sphinx.so';

Notes

See also http://sphinxsearch.com/bugs/view.php?id=1090 and http://sphinxsearch.com/forum/view.html?id=8982


PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Sandbox at the OTN MySQL Developers day in Paris, March 21st

Март 13th, 2012
On March 21st I will be in Paris, to attend the OTN MySQL Developers Day.Oracle is organizing these events all over the world, and although the majority are in the US, some of them are touching the good old European continent.Previous events were an all-Oracle show. Recently, the MySQL Community team has been asking for cooperation from the community, and in such capacity I am also presenting at the event, on the topic of testing early releases of MySQL in a sandbox. Of course, this is one of my favorite topics, but it is quite appropriate in this period, when Oracle has released a whole lot of preview features in its MySQL Labs. Which is another favorite topic of mine, since I was the one who insisted for having the Labs when I was working in the community team. It's nice to see that the labs are still in place, and being put to good use.

MySQL Sandbox

Speaking of sandboxes, I was making some quick tests yesterday, and I installed 15 sandboxes at once (all different versions, from 5.0.91 to 5.6.5). Installing a single sandbox, depending on the version, takes from 5 to 19 seconds.Do you know how long it takes to install 15 sandboxes, completely, from tarball to working conditions? It takes 19 seconds. How's so? It's because I have been working at a large project where we are dealing with many replicated clusters spread across three continents. Administering these clusters is a problem in itself, and so we are using tools to do our work in parallel. At the same time, using a host with a fast 16 core CPU I can install many sandboxes at once. It's a real joy to see software behaving efficiently the way it should!It works so fast, in fact, that I found a race condition bug. If you install more than one sandbox at once, the MySQL bootstrap process may try to open the same temporary file from two different servers. That's because I did not indicate a dedicated temporary directory for the bootstrap (I was using one only for the installed sandbox). When this happens, you may find that instead of 15 sandboxes you have installed only 9 or 11. So I fixed the bug, by adding --tmpdir to mysql_install_db, and now you can install more than one sandbox in parallel.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL Sandbox at the OTN MySQL Developers day in Paris, March 21st

Март 13th, 2012
On March 21st I will be in Paris, to attend the OTN MySQL Developers Day.Oracle is organizing these events all over the world, and although the majority are in the US, some of them are touching the good old European continent.Previous events were an all-Oracle show. Recently, the MySQL Community team has been asking for cooperation from the community, and in such capacity I am also presenting at the event, on the topic of testing early releases of MySQL in a sandbox. Of course, this is one of my favorite topics, but it is quite appropriate in this period, when Oracle has released a whole lot of preview features in its MySQL Labs. Which is another favorite topic of mine, since I was the one who insisted for having the Labs when I was working in the community team. It's nice to see that the labs are still in place, and being put to good use.

MySQL Sandbox

Speaking of sandboxes, I was making some quick tests yesterday, and I installed 15 sandboxes at once (all different versions, from 5.0.91 to 5.6.5). Installing a single sandbox, depending on the version, takes from 5 to 19 seconds.Do you know how long it takes to install 15 sandboxes, completely, from tarball to working conditions? It takes 19 seconds. How's so? It's because I have been working at a large project where we are dealing with many replicated clusters spread across three continents. Administering these clusters is a problem in itself, and so we are using tools to do our work in parallel. At the same time, using a host with a fast 16 core CPU I can install many sandboxes at once. It's a real joy to see software behaving efficiently the way it should!It works so fast, in fact, that I found a race condition bug. If you install more than one sandbox at once, the MySQL bootstrap process may try to open the same temporary file from two different servers. That's because I did not indicate a dedicated temporary directory for the bootstrap (I was using one only for the installed sandbox). When this happens, you may find that instead of 15 sandboxes you have installed only 9 or 11. So I fixed the bug, by adding --tmpdir to mysql_install_db, and now you can install more than one sandbox in parallel.

PlanetMySQL Voting: Vote UP / Vote DOWN

common_schema rev. 218: QueryScript, throttling, processes, documentation

Февраль 8th, 2012

common_schema, revision 218 is released, with major new features, top one being server side scripting. Here are the highlights:

  • QueryScript: server side scripting is now supported by common_schema, which acts as an interpreter for QueryScript code.
  • Throttling for queries is now made available via the throttle() function.
  • Enhancements to processlist-related views, including the new slave_hosts view.
  • Inline documentation/help is available via the help() routine.
  • more...

QueryScript

common_schema makes for a QueryScript implementation for MySQL. You can run server side scripts, interpreted by common_schema, which allow for easy syntax and greater power than was otherwise previously available on the MySQL server. For example:

foreach($table, $schema, $engine: table like '%')
  if ($engine = 'ndbcluster')
    ALTER ONLINE TABLE :$schema.:$table REORGANIZE PARTITION;

QueryScript includes flow control, conditional branching, variables & variable expansion, script throttling and more.

Read more on common_schema's QueryScript implementation.

Query throttling

Throttling for MySQL queries was suggested by means of elaborate query manipulation. It is now reduced into a single throttle function: one can now just invoke throttle(3) on one's query, so as to make the query execute for a longer time, while taking short sleep breaks during operation, easing up the query's demand for resources.

Read more on query throttling.

Process views

The processlist_grantees view provides with more details on the running processes. slave_hosts is a new view, listing hostnames of connected slaves.

Read more on process views.

help()

The common_schema documentation is now composed of well over 100 pages, including synopsis, detailed internals discussion, notes and examples. I can't exaggerate in saying that the documentation took the vast majority of time for this code to release.

The documentation is now made available inline, from within you mysql client, via the help() routine. Want to know more about redundant (duplicate) keys and how to find them? Just type:

call help('redundant');

and see what comes out!

The entire documentation, which is available online as well as a downloadable bundle, is embedded into common_schema itself. It's rather cool.

Tests

common_schema is tested. The number of tests in common_schema is rapidly growing, and new tests are introduced for new features as well as for older ones. There is not yet full coverage for all views, but I'm working hard at it. common_schema is a robust piece of code!

Get it!

Download common_schema on the common_schema project page.

Read the documentation online, or download it as well (or call for help())

common_schema is released under the BSD license.


PlanetMySQL Voting: Vote UP / Vote DOWN

QueryScript: SQL scripting language

Февраль 8th, 2012

Introducing QueryScript: a programming language aimed for SQL scripting, seamlessly combining scripting power such as flow control & variables with standard SQL statements or RDBMS-specific commands.

QueryScript is available fro MySQL via common_schema, which adds MySQL-specific usage.

What does QueryScript look like? Here are a few code samples:

Turn a bulk DELETE operation into smaller tasks. Throttle in between.

while (DELETE FROM archive.events WHERE ts < CURDATE() LIMIT 1000)
{
  throttle 2;
}

Convert all InnoDB tables in the 'sakila' database to compressed format:

foreach ($table, $schema, $engine: table in sakila)
{
  if ($engine = 'InnoDB')
    ALTER TABLE :$schema.:$table ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;
}

Shard your data across multiple schemata:

foreach($shard: {USA, GBR, JAP, FRA})
{
  CREATE DATABASE db_:$shard;
  CREATE TABLE db_:$shard.city LIKE world.City;
  INSERT INTO db_:$shard.city SELECT * FROM world.City WHERE CountryCode = $shard;
}

This tight integration between script and SQL, with the power of iteration, conditional statements, variables, variable expansion, throttling etc., makes QueryScript a power tool, with capabilities superseding those of stored routines, and allowing for simplified, dynamic code.

QueryScript code is interpreted. It's just a text, so it can be read from a @user_defined_variable, a table column, text file, what have you. For example:

mysql> set @script := "while (TIME(SYSDATE()) < '17:00:00') SELECT * FROM world.City WHERE id = 1 + FLOOR((RAND()*4079));";
mysql> call run(@script);

For more details, consult the QueryScript site.

If you're a MySQL user/DBA, better read the common_schema QueryScript documentation, to better understand the specific common_schema implementation and enhanced features.

common_schema, including the QueryScript interpreter, can be downloaded from the common_schema project page.


PlanetMySQL Voting: Vote UP / Vote DOWN

Documentation in SQL: CALL for help()

Январь 11th, 2012

Documentation is an important part of any project. On the projects I maintain I put a lot of effort on documentation, and, frankly, the majority of time spent on my projects is on documentation.

The matter of keeping the documentation faithful is a topic of interest. I'd like to outline a few documentation bundling possibilities, and the present the coming new documentation method for common_schema. I'll talk about any bundling that is NOT man pages.

High level: web docs

This is the initial method of documentation I used for openark kit and mycheckpoint. It's still valid for mycheckpoint. Documentation is web-based. You need Internet access to read it. It's in HTML format.

Well, not exactly HTML format: I wrote it in WordPress. Yes, it's HTML, but there's a lot of noise around (theme, menus, etc.) which is not strictly part of the documentation.

While this is perhaps the easiest way to go, here's a few drawbacks:

  • You're bound to some framework (WordPress in this case)
  • Docs are split between MySQL database (my underlying WordPRess storage) & WordPress files (themes, style, header, footer etc.)
  • Documentation is separate from your code - they're just not in the same place
  • There is no version control over the documentation.

The result is a single source of documentation, which applies to whatever version is latest. It's impossible to maintain docs for multiple versions. You must manually synchronize your WordPress updates with code commits (or rather - code release!).

Mid level: version controlled HTML docs

I first saw this approach on Baron's Aspersa gets a user manual post. I loved it: the documentation is HTML, but stored as part of your project's code, in same version control.

This means one can browse the documentation (openark kit in this example) exactly as it appears in the baseline. Depending on your project hosting, one may be able to do so per version.

The approach has the great benefit of having the docs tightly coupled with the code in terms of development. Before committing code, one updates documentation for that code, then commits/releases both together.

You're also not bound to any development framework. You may edit with vim, emacs, gedit, bluefish, eclipse, ... any tool of your choice. It's all down to plain old text files.

Mid level #2: documentation bundling

One thing I started doing with common_schema is to release a doc bundle with the code. So one can download a compressed bundle of all HTML files. That way one is absolutely certain what's the right documentation for revision 178. There's no effort about it: the docs are already tightly coupled with code versions. Just compress and distribute.

Low level: documentation coupled with your code

Perl scripts can be written as Perl modules, in which case they are eligible for using the perldoc convention. You code your documentation within your script itself, as comment. Perldoc can extract the documentation and present in man-like format. Same happens with Python's pydoc. Baron's When documentation is code illustrates that approach. Maatkit (now Percona Toolkit) has been using it for years.

This method has the advantage of having the documentation ready right within your shell. You don't need a browser, nor firewall access. The docs are just there for you in the same environment where you're executing the code.

SQL Low level: CALL for help()

common_schema is a different type of project. It is merely a schema. There's no Perl nor Python. One imports the schema into one's MySQL server.

What's the low-level approach for this type of code?

For common_schema I use three levels of documentation: the mid-level, where one can browse through the versioned docs, the 2nd mid-level, where one can download bundled documentation, and then a low-level approach: documentation embedded within the code.

MySQL's documentation is also built into the server: see the help_* tables within the mysql schema. The mysql command line client allows one to access help by supporting the help command, e.g.

mysql> help create table;

The client intercepts this command (this is not server side command) and searches through the mysql.help_* docs.

With common_schema, I don't have control over the client; it's all on server side. But the code being a schema, what with stored routines and tables, it's easy enough to set up documentation.

As of the next version of common_schema, and following MySQL's method, common_schema provides a help table:

DESC help;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| topic        | varchar(32) | NO   | PRI | NULL    |       |
| help_message | text        | NO   |     | NULL    |       |
+--------------+-------------+------+-----+---------+-------+

And a help() procedure, so that you can call for help(). The procedure will look for the best matching document based on your search expression:

root@mysql-5.1.51> CALL help('match');
+-------------------------------------------------------------------------------+
| help                                                                          |
+-------------------------------------------------------------------------------+
|                                                                               |
| NAME                                                                          |
|                                                                               |
| match_grantee(): Match an existing account based on user+host.                |
|                                                                               |
| TYPE                                                                          |
|                                                                               |
| Function                                                                      |
|                                                                               |
| DESCRIPTION                                                                   |
|                                                                               |
| MySQL does not provide with identification of logged in accounts. It only     |
| provides with user + host:port combination within processlist. Alas, these do |
| not directly map to accounts, as MySQL lists the host:port from which the     |
| connection is made, but not the (possibly wildcard) user or host.             |
| This function matches a user+host combination against the known accounts,     |
| using the same matching method as the MySQL server, to detect the account     |
| which MySQL identifies as the one matching. It is similar in essence to       |
| CURRENT_USER(), only it works for all sessions, not just for the current      |
| session.                                                                      |
|                                                                               |
| SYNOPSIS                                                                      |
|                                                                               |
|                                                                               |
|                                                                               |
|        match_grantee(connection_user char(16) CHARSET utf8,                   |
|        connection_host char(70) CHARSET utf8)                                 |
|          RETURNS VARCHAR(100) CHARSET utf8                                    |
|                                                                               |
|                                                                               |
| Input:                                                                        |
|                                                                               |
| * connection_user: user login (e.g. as specified by PROCESSLIST)              |
| * connection_host: login host. May optionally specify port number (e.g.       |
|   webhost:12345), which is discarded by the function. This is to support      |
|   immediate input from as specified by PROCESSLIST.                           |
|                                                                               |
|                                                                               |
| EXAMPLES                                                                      |
|                                                                               |
| Find an account matching the given use+host combination:                      |
|                                                                               |
|                                                                               |
|        mysql> SELECT match_grantee('apps', '192.128.0.1:12345') AS            |
|        grantee;                                                               |
|        +------------+                                                         |
|        | grantee    |                                                         |
|        +------------+                                                         |
|        | 'apps'@'%' |                                                         |
|        +------------+                                                         |
|                                                                               |
|                                                                               |
|                                                                               |
| ENVIRONMENT                                                                   |
|                                                                               |
| MySQL 5.1 or newer                                                            |
|                                                                               |
| SEE ALSO                                                                      |
|                                                                               |
| processlist_grantees                                                          |
|                                                                               |
| AUTHOR                                                                        |
|                                                                               |
| Shlomi Noach                                                                  |
|                                                                               |
+-------------------------------------------------------------------------------+

I like HTML for documentation. I think it's a good format, provided you don't start doing funny things. Perhaps TROFF is more suitable; certainly more popular on Unix machines. But I already have everything in HTML. So, what do I do?

My decision was to keep documentation in HTML, and use the handy html2text tool to do the job. And it does it pretty well! The sample you see above is an automated translation of HTML to plain text.

I add a few touches of my own: SELECTing long texts is ugly, whether you do it via ";" or "\G". The help() routine breaks the text by '\n', returning a multi row result set. The above sample makes for some 60+ rows, nicely formatted, broken from the original single text appearing in the help table.

So now you have an internal help method for common_schema, right where the code is. You don't have to leave the command line client in order to get help.

Giuseppe offered me the idea for this, even while my own thinking about it was in early stages.

The next version of common_schema will be available in a few weeks. The code is pretty much ready. I just need to work on, ahem..., the documentation.


PlanetMySQL Voting: Vote UP / Vote DOWN

Quick (and dirty) Patch for Ruby Enterprise Edition 2011.03 to Prevent Hash Collision Attacks

Декабрь 29th, 2011

Since there is no patch for Ruby Enterprise Edition 2011.03 to prevent the Hash Collision DoS Attack, I’ve quickly ported ruby 1.8.7 patchlevel 357 patch. Here it is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
From e19bd3eaa8bd71cfc9e5bf436527f015b093f31e Mon Sep 17 00:00:00 2001
From: shyouhei <shyouhei@b2dd03c8-39d4-4d8f-98ff-823fe69b080e>
Date: Wed, 28 Dec 2011 12:47:15 +0000
Subject: [PATCH] -This line, and those below, will be ignored--

M    ruby_1_8_7/inits.c
M    ruby_1_8_7/string.c
M    ruby_1_8_7/st.c
M    ruby_1_8_7/test/ruby/test_string.rb
M    ruby_1_8_7/random.c


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_1_8_7@34151 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
---
 ChangeLog                |   26 ++++++++++++++++
 inits.c                  |    4 ++
 random.c                 |   74 +++++++++++++++++++++++++++++++++++----------
 st.c                     |   14 ++++++++-
 string.c                 |    7 ++++-
 test/ruby/test_string.rb |   13 ++++++++
 version.h                |    8 ++--
 7 files changed, 123 insertions(+), 23 deletions(-)

diff --git a/inits.c b/inits.c
index 947bbbe..a0e061f 100644
--- a/inits.c
+++ b/inits.c
@@ -38,6 +38,7 @@
 void Init_sym _((void));
 void Init_process _((void));
 void Init_Random _((void));
+void Init_RandomSeed _((void));
 void Init_Range _((void));
 void Init_Regexp _((void));
 void Init_signal _((void));
@@ -46,10 +47,13 @@
 void Init_Time _((void));
 void Init_var_tables _((void));
 void Init_version _((void));
+void Init_st _((void));
 
 void
 rb_call_inits()
 {
+    Init_RandomSeed();
+    Init_st();
     Init_sym();
     Init_var_tables();
     Init_Object();
diff --git a/random.c b/random.c
index c0560e3..24a0787 100644
--- a/random.c
+++ b/random.c
@@ -189,6 +189,7 @@
 #include <fcntl.h>
 #endif
 
+static int seed_initialized = 0;
 static VALUE saved_seed = INT2FIX(0);
 
 static VALUE
@@ -250,27 +251,22 @@
     return old;
 }
 
-static VALUE
-random_seed()
+#define DEFAULT_SEED_LEN (4 * sizeof(long))
+
+static void
+fill_random_seed(ptr)
+    char *ptr;
 {
     static int n = 0;
+    unsigned long *seed;
     struct timeval tv;
     int fd;
     struct stat statbuf;
+    char *buf = (char*)ptr;
 
-    int seed_len;
-    BDIGIT *digits;
-    unsigned long *seed;
-    NEWOBJ(big, struct RBignum);
-    OBJSETUP(big, rb_cBignum, T_BIGNUM);
-
-    seed_len = 4 * sizeof(long);
-    big->sign = 1;
-    big->len = seed_len / SIZEOF_BDIGITS + 1;
-    digits = big->digits = ALLOC_N(BDIGIT, big->len);
-    seed = (unsigned long *)big->digits;
+    seed = (unsigned long *)buf;
 
-    memset(digits, 0, big->len * SIZEOF_BDIGITS);
+    memset(buf, 0, DEFAULT_SEED_LEN);
 
 #ifdef S_ISCHR
     if ((fd = open("/dev/urandom", O_RDONLY
@@ -285,7 +281,7 @@
 #endif
             )) >= 0) {
         if (fstat(fd, &statbuf) == 0 && S_ISCHR(statbuf.st_mode)) {
-            read(fd, seed, seed_len);
+            read(fd, seed, DEFAULT_SEED_LEN);
         }
         close(fd);
     }
@@ -296,13 +292,37 @@
     seed[1] ^= tv.tv_sec;
     seed[2] ^= getpid() ^ (n++ << 16);
     seed[3] ^= (unsigned long)&seed;
+}
+
+static VALUE
+make_seed_value(char *ptr)
+{
+    BDIGIT *digits;
+    NEWOBJ(big, struct RBignum);
+    OBJSETUP(big, rb_cBignum, T_BIGNUM);
+
+    RBIGNUM_SET_SIGN(big, 1);
+
+    digits = ALLOC_N(char, DEFAULT_SEED_LEN);
+    RBIGNUM(big)->digits = digits;
+    RBIGNUM(big)->len = DEFAULT_SEED_LEN / SIZEOF_BDIGITS;
+
+    MEMCPY(digits, ptr, char, DEFAULT_SEED_LEN);
 
     /* set leading-zero-guard if need. */
-    digits[big->len-1] = digits[big->len-2] <= 1 ? 1 : 0;
+    digits[RBIGNUM_LEN(big)-1] = digits[RBIGNUM_LEN(big)-2] <= 1 ? 1 : 0;
 
     return rb_big_norm((VALUE)big);
 }
 
+static VALUE
+random_seed(void)
+{
+    char buf[DEFAULT_SEED_LEN];
+    fill_random_seed(buf);
+    return make_seed_value(buf);
+}
+
 /*
  *  call-seq:
  *     srand(number=0)    => old_seed
@@ -443,6 +463,9 @@
     long val, max;
 
     rb_scan_args(argc, argv, "01", &vmax);
+    if (!seed_initialized) {
+       rand_init(random_seed());
+    }
     switch (TYPE(vmax)) {
       case T_FLOAT:
    if (RFLOAT(vmax)->value <= LONG_MAX && RFLOAT(vmax)->value >= LONG_MIN) {
@@ -490,6 +513,8 @@
     return LONG2NUM(val);
 }
 
+static char initial_seed[DEFAULT_SEED_LEN];
+
 void
 rb_reset_random_seed()
 {
@@ -497,9 +522,24 @@
 }
 
 void
+Init_RandomSeed(void)
+{
+    fill_random_seed(initial_seed);
+    init_by_array((unsigned long*)initial_seed, DEFAULT_SEED_LEN/sizeof(unsigned long));
+    seed_initialized = 1;
+}
+
+static void
+Init_RandomSeed2(void)
+{
+    saved_seed = make_seed_value(initial_seed);
+    memset(initial_seed, 0, DEFAULT_SEED_LEN);
+}
+
+void
 Init_Random()
 {
-    rand_init(random_seed());
+    Init_RandomSeed2();
     rb_define_global_function("srand", rb_f_srand, -1);
     rb_define_global_function("rand", rb_f_rand, -1);
     rb_global_variable(&saved_seed);
diff --git a/st.c b/st.c
index c16c310..21e157a 100644
--- a/st.c
+++ b/st.c
@@ -9,6 +9,7 @@
 #include <stdlib.h>
 #endif
 #include <string.h>
+#include <limits.h>
 #include "st.h"
 
 typedef struct st_table_entry st_table_entry;
@@ -521,6 +522,8 @@ struct st_table_entry {
     return 0;
 }
 
+static unsigned long hash_seed = 0;
+
 static int
 strhash(string)
     register const char *string;
@@ -550,10 +553,11 @@ struct st_table_entry {
 
     return val + (val << 15);
 #else
-    register int val = 0;
+    register unsigned long val = hash_seed;
 
     while ((c = *string++) != '\0') {
    val = val*997 + c;
+   val = (val << 13) | (val >> (sizeof(st_data_t) * CHAR_BIT - 13));
     }
 
     return val + (val>>5);
@@ -573,3 +577,11 @@ struct st_table_entry {
 {
     return n;
 }
+
+extern unsigned long rb_genrand_int32(void);
+
+void
+Init_st(void)
+{
+    hash_seed = rb_genrand_int32();
+}
diff --git a/string.c b/string.c
index c6b2301..94a0281 100644
--- a/string.c
+++ b/string.c
@@ -875,13 +875,15 @@
     return str1;
 }
 
+static unsigned long hash_seed;
+
 int
 rb_str_hash(str)
     VALUE str;
 {
     register long len = RSTRING(str)->len;
     register char *p = RSTRING(str)->ptr;
-    register int key = 0;
+    register unsigned long key = hash_seed;
 
 #if defined(HASH_ELFHASH)
     register unsigned int g;
@@ -905,6 +907,7 @@
     while (len--) {
    key = key*65599 + *p;
    p++;
+   key = (key << 13) | (key >> ((sizeof(unsigned long) * CHAR_BIT) - 13));
     }
     key = key + (key>>5);
 #endif
@@ -5062,4 +5065,6 @@ struct tr {
     rb_fs = Qnil;
     rb_define_variable("$;", &rb_fs);
     rb_define_variable("$-F", &rb_fs);
+
+    hash_seed = rb_genrand_int32();
 }
diff --git a/test/ruby/test_string.rb b/test/ruby/test_string.rb
index 5f2c54f..4d97182 100644
--- a/test/ruby/test_string.rb
+++ b/test/ruby/test_string.rb
@@ -1,4 +1,5 @@
 require 'test/unit'
+require File.expand_path('envutil', File.dirname(__FILE__))
 
 class TestString < Test::Unit::TestCase
   def check_sum(str, bits=16)
@@ -29,4 +30,16 @@ def test_inspect
   ensure
     $KCODE = original_kcode
   end
+
+  def test_hash_random
+    str = 'abc'
+    a = [str.hash.to_s]
+    cmd = sprintf("%s -e 'print %s.hash'", EnvUtil.rubybin, str.dump)
+    3.times {
+      IO.popen(cmd, "rb") {|o|
+        a << o.read
+      }
+    }
+    assert_not_equal([str.hash.to_s], a.uniq)
+  end
 end

--- a/version.c 2011-12-19 03:22:43.000000000 +0000
+++ b/version.c 2011-12-29 18:18:58.000000000 +0000
@@ -46,7 +46,7 @@
     rb_define_global_const("RUBY_PATCHLEVEL", INT2FIX(RUBY_PATCHLEVEL));
 
     snprintf(description, sizeof(description),
-             "ruby %s (%s %s %d) [%s], MBARI 0x%x, Ruby Enterprise Edition %s",
+             "ruby %s (%s %s %d) [%s], MBARI 0x%x, Ruby Enterprise Edition %s (with hash random)",
              RUBY_VERSION, RUBY_RELEASE_DATE, RUBY_RELEASE_STR,
              RUBY_RELEASE_NUM, RUBY_PLATFORM,
              STACK_WIPE_SITES, REE_VERSION);
--
1.7.5.4

You can view it or download it from github.

Disclaimer: This is provided as is, no guarantees are provided, etc.



PlanetMySQL Voting: Vote UP / Vote DOWN

Oracle "Technologist of the Year: Developer" Award

Декабрь 21st, 2011

I am honored to receive Oracle's Technologist of the Year: Developer award, formerly Oracle Magazine Editors’ Choice Awards.

Technologist of the Year Award is given for individuals for their technical achievements with regard to Oracle products.

As opposed to community based awards, to win this award one must be nominated by himself or his company. There are several award categories: Developer, DBA, IT Manager etc., and many nominations per category. I have been nominated by my company and am happy to have won the award in the Developer category.

Allow me to take the pleasant opportunity to make some acknowledgements.

makam

Most people know me as a MySQL consultant/instructor/speaker/something. I wear several hats; in one I am co-founder and CTO of SFNK, an Israeli company providing the makam system, a specialized service providing valuable insight on our customers increasing need: the need to know what people say about them on the internet. Our product is running for 7 years now, is serving large amounts of data, and is operating several MySQL server nodes.

I've been working with MySQL for over 11 years, but it is throughout my work in makam that I have come to look deeply into query & server optimization, backups, scale out, design etc. We had some demanding features that pushed me into developing scripts and solutions for MySQL, released as open source. Throughout my experience as a DBA (I'm originally a developer) I came to appreciate the MySQL product, on one had, and find usability fallacies, on the other. The fact that our product is DB centric put me in the position to learn as much as I could about SQL.

I thank makam for this. This product has been the test bed for many of my experiments. Thankfully I never erased the database!

The MySQL community

Though I just recently wrote this, I'm happy to acknowledge it again.

Since my work is based off open source products, open source projects, free advice, professional blogs, I have gained much from the open source world in general, and from the MySQL community in particular, that made me want to share some back.

Its wonderful to have people read and comment on my blog posts and ideas; download, use and provide feedback on code I write; attend an occasional talk. All this pushes me in my work. I honestly hope to have provided with valuable content, code or assistance, to just anyone. Anyone and someone who would benefit from the richness of open source and knowledge sharing.

I am humbled by the advantage I got from the community. I have found names behind the community's first anonymous appearance. I've found people who are easy to reach, happy to respond, generous in their advice, experts in their field.

Thank you.

Oracle & MySQL

An observation on this award is that it is given to a MySQL technologist. I have never worked with Oracle (as in the RDBMS) technologies, and my entire nomination is based on my experience with MySQL.

I see significance in this, and believe it reaffirms the focus and attention MySQL gets from Oracle. It makes me happy as a MySQL user and as a MySQL community member.

I thank Oracle for giving me this honor.


PlanetMySQL Voting: Vote UP / Vote DOWN

05.12. Doctrine 2

Декабрь 5th, 2011

Introduction

Object-relational mapping (ORM) frameworks have been around for several years now and for some people, ORM is already outdated by now. As we have seen with other technologies and concepts before, PHP is not exactly what we call an early adopter among the programming languages. Thus it took some time for ORM to grow up in the PHP context.

There have been some frameworks before Doctrine 2 that implement ORM (remember e.g. Propel) specific tasks but most of them lack the required maturity to be used in large projects. With Doctrine 2, PHP takes a huge step into the right direction – Doctrine 2 is fast, extensible and easy to use.

This article will take you on a tour through the main concepts of Doctrine 2 in the first part and then explain how to use it in a real world application in the second part. Since at the time of writing Zend Framework 1.11.xx (ZF) is very popular, we will integrate Doctrine 2 into a ZF project.


Basic Concepts

To understand Doctrine 2, we have to take a look at some relevant terms (or in this case objects), study their behavior and practice their usage. We start with some introductory phrases on ORM systems and then go on to the concepts underlying Doctrine 2: Entity Objects, the Entity Manager, Repositories and Proxies.


Object-relational Mapping

Since the beginning of Object-Orientation, people had to manage the persistence of their application's state resp. their objects. In the context of Web Application Development, this usually involves a Database server which is being consulted using a Query Language. One example for this pattern is a PHP application that uses some kind of SQL server by sending SQL queries to it. Another one is an application using a CouchDB server by querying it via its REST API.
Due to the author's laziness, we will talk in terms of relational databases from now on. Keep in mind, that you can accomplish almost everything mentioned here with NoSQL databases, too.

ORM relates value objects that exist in an application's business logic to database records. Thus every object that should be persistent is saved in one row of a database table. The most common approach is to map classes to tables and the classes' objects to rows in the these tables.
Besides writing objects to a database, ORM systems are also intended to ease the process of finding data stored in the database. When talking in terms of ORM, finding data always means making the framework fetch one or many objects that meet a certain criteria.


Entity Objects

The objects that are being managed by an ORM system are called Entity Objects. Every entity object relates to one entry in a table. In Doctrine 2, the classes that represent entities do not have to fulfill special requirements like inheriting from a certain super class (as you might have seen in other database abstraction frameworks like Zend_Db). When creating a new entity class with Doctrine 2, all you have to do is to write down a regular PHP class with properties. Besides this, you have to provide some hints on how these attributes should be persisted. The information how entity attributes relate to columns in the DB is called Metadata. Metadata can be described in different ways: By default there are metadata drivers for descriptions in XML, YAML and PHP. The fourth and most popular driver is based on DocBlock annotations (since in PHP, annotations aren't a language feature as in Java (see Wikipedia), they are contained by the classes' and attributes' DocBlocks). We will use annotations to describe our entities metadata. To get an impression on how easy this is, take a look at the following example.



This example contains all it needs to tell Doctrine 2 about the new entity User. With this class, you can create, find, delete and modify user objects and persist their state to the underlying database. But keep in mind: as long as you don't need any persistence features, you can use your user objects just like any other objects!

The next two objects resp. object types we will describe are responsible for doing the ORM functionality: persisting and finding.


The Entity Manager

To use ORM functionality, the Entity Manager (Doctrine\ORM\EntityManager) is the main access point to Doctrine 2. The entity manager is responsible – as you might have guessed – for managing entities and for building a facade for the whole framework. To accomplish its tasks, the entity manager uses some helpers. The Unit of Work object for example collects entities that should be written back to the database and is capable of doing this in batches. This way, database operation can be executed with almost no overhead and therefore are really fast.

Another dependency of the entity manager is the Event Manager. To be as extensible as possible, Doctrine 2 comes with an event system that publishes all important state changes to the outside as events. You can register for such events and extend the life cycle of your entity objects at one single point.

The entity manager's API combines methods for managing entities (find, persist, contains, copy, detatch, merge, remove and refresh), methods that control the use of transations (beginTransaction, commit, flush, rollback and transactional) and some helper methods for creating custom queries and accessing some of the entity manager's dependencies.

The following example shows how to query an object from the entity manager, modify it and write the changes back into the database.



Creating a new persistent object is almost as easy as modifying it:




Repositories

For finding entities, Repositories are used. Every entity class has its own repository which is responsible for finding entities of that type. By default, repositories have some handy methods for fetching entities that match certain criteria:

  • find: Finds an entity by its primary key / identifier
  • findAll: Finds all entities of the repository's entity type
  • findBy / findOneBy: Finds all resp. one entity that matches the passed criteria:
  • findBy<attribute> / findOneBy<attribute>: Magic methods that ease the filtering by a single attribute:

To access a repository, all you have to do is ask the entity manager for one. If you have implemented your own repository, it will be returned by Doctrine\ORM\EntityManager::getRepository(). Otherwise, Doctrine 2 will provide a generic repository. The main reason to implement custom repository classes is to group custom queries for an entity type to make them reusable. For custom query logic, there are several mechanisms you can use: You can either use Doctrine's query builder that implements an API similar to Zend_Db_Select or queries written in the Doctrine Query Language (DQL) or you can even execute plain SQL queries. With these options, it is also possible to migrate old applications which use complex queries by just wrapping these queries into the methods of custom repositories.


Proxies

When traversing a graph of entity objects (which is required when entities are having relations to other entities), it would be very expensive (in the sense of “requiring many database queries”) to fetch every depending entity with an additional query. Therefore Doctrine 2 uses the concept of Proxy objects that represent regular entity objects which have not been populated with data from the database. Take a look at the following example where the entity Group aggregates a list of User objects in its member property. When accessing the members list, Doctrine 2 provides a collection of proxy objects instead of complete User objects. When an object of this collection is being asked for one of its properties, Doctrine loads the object's data from the database. This way, the users' data is not loaded until it is really needed.




Advanced Mapping Concepts

This section describes some advanced concepts that are required when mapping entity classes that have relationships to other entity classes. Possible relationship types are association and inheritance. Inheritance is the mechanism used for representing subtypes in object-oriented programming languages. An example would be a class User that implements methods every user of a software should have and a class Administrator inheriting from User that adds methods for determining the administrator's access rights.

Association is a weaker relation type. It means that an entity object can be related to other entity objects of other types. In terms of relational databases, there are three types of association which differ in the number of entities an object is related to: 1:1, 1:n and n:m relationships. n and m are placeholders and mean multiple.


Association

To put objects of an entity type into relation, you just have to mention this relation in the entity class' mapping information. The simplest case is a unidirectional 1:1 relationship. In the following example we describe a User entity which has its access information (user credentials) encapsulated into another entity class called UserCredential. Since every user has at most one credential object and every credential object may only be associated to one user object, this is a 1:1 relationship.



If the relationship should be bidirectional, include the OneToOne attribute in the other class, too, and add an attribute which denotes the attribute of the other entity that mapps the related object:



This way, you can access the user object from the credentials object, too.
Most of the times, developers have to deal with relationships which include many objects on at least one side. These relationships are called 1:n or n:m relationships. This means that either one or multiple entities are standing in relationship with an arbitrary number of entities of another type. To accomplish this, you have to use the mapping keywords OneToMany or ManyToMany when describing your entities. Besides that, the mapping works the exact same way as with 1:1 relationships.

There are however some tricks you should know when dealing with collections of associated entity objects. Consider the following relationship between the entity classes User and Group:



When a group has at least one member, the group object will have a collection of the type Doctrine\Common\Collections\ArrayCollection set as its members property. This collection contains all user objects (or proxy objects as we have seen before) and can be modified intuitively with the methods add and removeElement. To honor object-orientation, you might want to introduce custom methods for these tasks. If you do so, you get into trouble when the group object does not have any users associated. In this case, the collection will simply be set to null. To avoid checks whether the collection has already be initialized, you should to this by yourself in the entity class' constructor:



It is also important to notice that one entity has to update the other entity's state as well when a relationship between to objects is created or removed. Take care to do this only in one class to avoid endless recursion loops! This class is called the Owning Side of the relationship. When implementing a bidirectional relationship, the other class is called the Inverse Side. It is important to determine owning and inverse side and implement the the classes accordingly to avoid greater trouble during debugging.

There are some more features implemented by Doctrine 2 enabling developers to specify their entities' relationships including sorting, pre-fetching and indexing. These topics are not covered in this article but are explained very understandable in the Doctrine 2 documentation.


Inheritance

Subtyping can be implemented in different ways using Doctrine 2. The main difference between these implementations is how the inheritance is mapped to the database. The options are to have one table for every class (Class Table Inheritance), to have one table for all classes in a hierarchy (Single Table Inheritance) and to have a table for every specialized sub-class of a given super-class (Mapped Super Class).We will give a short overview on all three alternatives, you have to pick the right one yourself. This decision should be made based on how many common attributes there are in your sub-classes.

Mapped Superclasses

Introducing a mapped superclass is probably the easiest way for specifying inheritance but might lead to many duplicate columns in your database schema. The superclass of your entities is not being declared as an entity itself (and might also be declared abstract) but provides attributes and optionally methods that will be available in all subclasses. When creating the database schema, Doctrine 2 merges all attributes and relationships of the superclass into the definitions of the subclasses and processes them as regular entities.



After creating the database from this mapping information, your tables will look like this:

Single Table Inheritance

When having entities that are very similar besides some few attributes, you might want to store them together in one database table. This approach is called Single Table Inheritance. To distinguish between the different types, there is always a column marked as discriminator column and a discriminator map that tells Doctrine 2 which values in the discriminator indicate what entity types.



These definitions cause the existing of one single table called User with all the attributes declared inside the classes User and Administrator plus a column type – the discriminator column. When working with entities of these types, Doctrine will manage the type flag automatically for you.

The resulting database schema looks as illustrated by the following diagram:

Class Table Inheritance

Having each entity type stored in its own table is always good for keeping your schema extensible. When you have to create a new subtype, Doctrine 2 will just create a new table for this type and it can inherit the logic and common attributes of a superclass. The only overhead you have with this approach is that all tables that correspond to subtypes have to maintain a relationship to their supertype's table. Using class table inheritance, the example with the entities User and Administrator looks like this:



Besides the inheritance type, there is no difference to the example using single table inheritance. The outcome on the resulting database scheme is huge. Now you have to separate tables which store users and administrators. Every record in the table Administrator has a corresponding record in the User table.


This was the first part of this article. Stay tuned for part II which will be published tomorrow (on 6th of December 2011)! In the second part, we will integrate Doctrine 2 into a Zend Framework application and include a generic sandbox (ZF-)project with Doctrine 2!


PlanetMySQL Voting: Vote UP / Vote DOWN

More MySQL foreach()

Декабрь 2nd, 2011

In my previous post I've shown several generic use cases for foreach(), a new scripting functionality introduced in common_schema.

In this part I present DBA's handy syntax for schema and table operations and maintenance.

Confession: while I love INFORMATION_SCHEMA's power, I just hate writing queries against it. It's just so much typing! Just getting the list of tables in a schema makes for this heavy duty query:

SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='sakila' AND TABLE_TYPE='BASE TABLE';

When a join is involved this really becomes a nightmare. I think it's cumbersome, and as result, many do not remember the names and meaning of columns, making for "oh, I need to read the manual all over again just to get that query right". Anyway, that's my opinion.

A SHOW TABLES statement is easier to type, but cannot be integrated into a SELECT query (though we have a partial solution for that, too), and besides, when filtering out the views, the SHOW statement becomes almost as cumbersome as the one on INFORMATION_SCHEMA.

Which is why foreach() offers handy shortcuts to common iterations on schemata and tables, as follows:

Use case: iterate all databases

call foreach('schema', 'CREATE TABLE ${schema}.event(event_id INT, msg VARCHAR(128))');

In the above we execute a query on each database. Hmmm, maybe not such a good idea to perform this operation on all databases? Let's filter them:

Use case: iterate databases by name match

call foreach('schema like wordpress_%', 'ALTER TABLE ${schema}.wp_posts MODIFY COLUMN comment_author VARCHAR(96) NOT NULL');

The above will only iterate my WordPress databases (I have several of these), performing an ALTER on wp_posts for each of those databases.

I don't have to quote the like expression, but I can, if I wish to.

I can also use a regular expression match:

call foreach('schema ~ /^wordpress_[0-9]+$/', 'ALTER TABLE ${schema}.wp_posts MODIFY COLUMN comment_author VARCHAR(96) NOT NULL');

Use case: iterate tables in a specific schema

Time to upgrade our sakila tables to InnoDB's compressed format. We use $(), a synonym for foreach().

call $('table in sakila', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB ROW_FORMAT=COMPRESSED');

The above will iterate on tables in sakila. I say tables, since it will avoid iterating views (there is still no specific syntax for views iteration). This is done on purpose, as my experience shows there is very little in common between tables and views when it comes to maintenance and operations.

Use case: iterate tables by name match

Here's a interesting scenario: you wish to work on all tables matching some name. The naive approach would be to:

SELECT TABLE_SCHEMA, TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'wp_posts' AND TABLE_TYPE = 'BASE TABLE'

Wait! Are you aware this may bring your server down? This query will open all databases at once, opening all .frm files (though thankfully not data files, since we only check for name and type).

Here's a better approach:

call foreach('table like wp_posts', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB');

(There's now FULLTEXT to InnoDB, so the above can make sense in the near future!)

The good part is that foreach() will look for matching tables one database at a time. It will iterate the list of database, then look for matching tables per database, thereby optimizing the query on INFORMATION_SCHEMA.

Here, too, I can use regular expressions:

call $('table ~ /^wp_.*$/', 'ALTER TABLE ${schema}.${table} ENGINE=InnoDB');

Conclusion

This is work in the making, but, as someone who maintains a few productions servers, I've already put it to work.

I'm hoping the syntax is easy to comprehend. I know that since I developed it it must be far more intuitive to myself than to others. I've tried to keep close on common syntax and concepts from various programming languages.

I would like to get as much feedback as possible. I have further ideas and thoughts on the direction common_schema is taking, but wish take it in small steps. Your feedback is appreciated!


PlanetMySQL Voting: Vote UP / Vote DOWN