Archive for the ‘PostgreSQL’ Category
Log Buffer #268, A Carnival of the Vanities for DBAs
Апрель 20th, 2012PlanetMySQL Voting: Vote UP / Vote DOWN
Log Buffer #268, A Carnival of the Vanities for DBAs
Апрель 20th, 2012PlanetMySQL Voting: Vote UP / Vote DOWN
Disproving the CAP Theorem
Апрель 1st, 2012We can therefore state with some confidence that CAP is confusing. Yet this observation itself raises deeper questions. Is CAP merely confusing? Or is it the case that as with other initially accepted but now doubtful ideas like the Copernican model, evolution, and continental drift, that CAP is actually not correct? Thoughtful readers will agree this question has not received anywhere near the level of scientific scrutiny it deserves.
Fortunately for science private citizens like me have been forging ahead without regard to the opinions of so-called experts or even common sense. My work on CAP relies on two trusted analytic tools of database engineers over the legal drinking age: formal logic and beer. Given the nature of the problem we should obviously use a minimum of the former and a maximum of the latter. We have established that CAP is confusing. To understand why we must now deepen our confusion and study its habits carefully. Other investigators have used this approach with great success.
Let us begin by translating the terms of CAP into the propositional calculus. The terms C (consistency), A (availability) and P (partition tolerance) can be used to state the famous "two out of three" of CAP using logical implication as shown below.
(1) A and P => not C
(2) P and C => not A
(3) C and A => not P
So far so good. We can now dispense briefly with logic and turn to confusion. It seems there is difficulty distinguishing the difference between CA and CP systems, i.e., that they are therefore equivalent. This is a key insight, which we can express formally as follows:
(4) C and A <=> C and P
which further reduces to
(5) A <=> P
In short our confusion has led us directly to the invaluable result that A and P, hence availability and partition tolerance, are exactly equivalent! I am sure you share my excitement at the direction this work is taking. We can now through a trivial substitution of A for P in equation 2 above reveal the following:
(6) A and C => not A
(7) C => (A => not A)
We have just shown that consistency implies that any system that is available is also unavailable simultaneously. This is an obvious contradiction, which means the vast logical edifice on which CAP relies crumbles like a soggy nacho. Considering the amount of beer consumed at the average database conference it is surprising nobody thought of this before.
At this point we can now raise the conversation up a level from looking for spare change under the table and comment on the greater meaning of our results in the real world. Which is the following: Given the way most of us programmers write software it's a wonder CAP is an issue at all. Honestly, I can't even get calendar programs to send invitations to each other across time zones. I plan to bring the combustible analytic capabilities of logic and beer to bear on the mystery of time at a later date. For now we can just speculate it is due to a mistaken design based on CAP.
PlanetMySQL Voting: Vote UP / Vote DOWN
Black-Box Performance Analysis with TCP Traffic
Февраль 23rd, 2012This is a cross-post from the MySQL Performance Blog. I thought it would be interesting to users of PostgreSQL, Redis, Memcached, and $system-of-interest as well.
For about the past year I’ve been formulating a series of tools and practices that can provide deep insight into system performance simply by looking at TCP packet headers, and when they arrive and depart from a system. This works for MySQL as well as a lot of other types of systems, because it doesn’t require any of the contents of the packet. Thus, it works without knowledge of what the server and client are conversing about. Packet headers contain only information that’s usually regarded as non-sensitive (IP address, port, TCP flags, etc), so it’s also very easy to get access to this data even in highly secure environments.
I’ve finally written up a paper that shows some of my techniques for detecting problems in a system, which can be an easy way to answer questions such as “is there something we should look into more deeply?” without launching a full-blown analysis project first. It’s available from the white paper section of our website: MySQL Performance Analysis with Percona Toolkit and TCP/IP Network Traffic
Further Reading:
- New Aspersa I/O analysis tool, diskstats
- Time TCP traffic with tcprstat
- How much network traffic does your MySQL server receive?
- High Performance MySQL, Second Edition: Query Performance Optimization
- Cary Millsap: Thinking Clearly about Performance
PlanetMySQL Voting: Vote UP / Vote DOWN
On datatypes, domains and why I think it’s time we reconsidered
Февраль 6th, 2012A datatype defines many attributes:
- What data I can store in it: Only numbers? Printable alaphanumeric characters? Unicode? Binary data? An object?
- What I can do with a value of that particular type and how does it behave? Can I concatenate two values? Add them (that is NOT the same as concatenate! The same operator (+) may be used, but it's not the same thing!)? Save it to a file?
- How is it stored and represented in technical terms. And this can be argued, a C int doesn't define how it is stored on disk, but a MySQL INT does! And a BLOB is a type that is largely there just because it defines how to handle things on disk!
No, I'm not a big fan of determining which datatype to use by considering storage requirements to optimize performance. That is something you might do as a performance enhancing measure after the job is done, possibly. In C, which is a language I have used longer than I care to remember, int seems to serve the role of being the default numeric integer datatype, which is not a bad thing, in particular not in C as C is a language used to write low-level stuff in, stuff that is performance sensitive, and a C int is linked to hardware specific integers.
FLOAT is another issue altogether. In very few business applications is there a need for FLOAT or DOUBLE in terms of IEEE floating point values. Really! This is a scientific number format, that is also supported by the Floating point unit in most computers of today, so operations on them are pretty fast these days. But I have seen too many business applications where the developer uses a FLOAT in the database for any kind of numeric value that is not an integer, which is a distinctly bad idea! It works as long as the numbers aren't too big, and when the operations on them are reasonably simple. This is not to say a FLOAT or DOUBLE isn’t useful, just that they are more targeted for scientific programming (Hey, they are after all defined by IEEE!). What most people really should use is DECIMAL (I'm using MySQL datatypes for reference, but you know what I mean). This is a datatype that accepts (hey, big surprise coming up here) decimal numbers, fixed point decimal numbers! Operations on DECIMAL are slower though, as they aren't really supported directly by either C (which is the language that MySQL is largely written in) or by the CPU / FPU. But they are better for business applications (for scientific applications, use FLOAT or DOUBLE. And continue to write your code in Fortran by all means!)
But all these datatype are increasingly used also to define the domain of the data in them! The SQL DOMAIN feature is in the SQL Standard but is largely forgotten. It is there in PostgreSQL for you to try. The advantage of this, in my mind, isn't just that you can define in more exact terms what data should go into a database column, without writing code to do that, and to make it apparent in the database schema what data is expected, it also means that we can get away from the wide array of different built-in datatypes in, say, MySQL. Instead we can stick with a few optimized ones, let them have a simple inherent domain and then we can define the application specific domains in application code. Anyone for a IP_ADDRESS datatype? A ZIPCODE datatype? An URL type. All those a reasonable requests from the application POV, but it really shouldn't (in my mind) be defined by the database server (As: What are the semantics of a ZIP-code? They are different in different countries, and the post-office can change their minds (although that is something that takes a long time I guess)). Why? As this is application specific domain! And application specifics should be in application code! Simple as that!
So what datatypes are useful then, the way I look at it:
- RAW Data - This should be a basic type. As long as can be, or as short. Any data can be stored in it, and nearly as much as you would like.
- Integers - Although an integer can be considered to be a special case of a more generic number, the integer domain is so generic, and the integer is such a common and basic type that is deserves a place here.
- Decimal – Fixed point decimal numbers.
- Float - Now we are getting close to the application domain here, but these guys deserve a place here anyway, as they are so common and hooked up to the programming languages and hardware.
- DATETIME - Only one of them is really needed. One of DATETIME, DATE, YEAR, MONTH, TIMESTAMP, TIME etc etc. that is. Just one, very generic base type, where the specific domain is, you've heard it before, application specific!
- Text - Reluctantly I add this to the list. Although this is just a RAW with some limitations (usually only UTF8 or ISO8859-15 or something characters), there are some operations and attributes that are so tied up with text strings that a TEXT type is reasonable to include.
I have now mentioned SQL DOMAINs a few times, so lets spend some quality time with them right now before closing up, here is a simple session using PostgreSQL 8.4:
- First we create a domain:
CREATE DOMAIN yesno AS CHAR(1)
DEFAULT 'N'
NOT NULL CHECK (VALUE = 'Y' OR VALUE = 'N'); - Then we create a table that uses that domain:
CREATE TABLE user_active(username CHAR(10), is_active YESNO); - And then we check if it works using a few SQL statements:
INSERT INTO user_active VALUES('foo', NULL);
ERROR: domain yesno does not allow null values
INSERT INTO user_active VALUES('foo', 'Y');
INSERT 0 1
INSERT INTO user_active VALUES('foo', 'N');
INSERT 0 1
INSERT INTO user_active VALUES('foo', 'A');
ERROR: value for domain yesno violates check constraint "yesno_check"
I have more to say on types, and I will follow up with a new post on this subject eventually, but this is enough for now.
/Karlsson
PlanetMySQL Voting: Vote UP / Vote DOWN
Continuent Partners With VNC To Collaborate in DACH
Февраль 1st, 2012PlanetMySQL Voting: Vote UP / Vote DOWN
MySQL and PostgreSQL Cloud Offerings – linux.conf.au 2012 miniconf talk by myself and Selena
Январь 23rd, 2012Selena and I gave a talk on the various issues of running databases “in the cloud” at the recent linux.conf.au in Ballarat. Video is up, embedded below:
PlanetMySQL Voting: Vote UP / Vote DOWN
German webinar Feb 7th at 15:00 CET: Hochverfügbarkeit und Performance von MySQL und PostgreSQL
Январь 18th, 2012PlanetMySQL Voting: Vote UP / Vote DOWN
I Really Dislike Anonymous Attacks
Ноябрь 14th, 2011- Sign your name. Readers are more impressed when they see you are not afraid to stand behind your words.
- Explain what problem you were trying to solve. Otherwise uncharitable readers might think you just started pumping information into a new database without thinking about possible consequences and now want to blame somebody else for your bad decision.
- Explain how you could do better. Not all designs work out, so propose alternatives. Readers love to see authors demonstrate that they are not discouraged by adversity.
PlanetMySQL Voting: Vote UP / Vote DOWN
MariaDB Statistics and Surveys
Октябрь 19th, 2011I just finished reading a couple of interesting, and somewhat related, blog posts which I think are worth sharing (apologies to anyone who has already seen them). One is from Jelastic and the other is from Michal Hrušecký.
I’ve written about MariaDB and the Jelastic cloud before (see MariaDB now available as a hosted database via Jelastic cloud platform). Now Jelastic has published statistics on the relative popularity of the various databases they offer. The good news is MariaDB is currently the database of choice for 14% of their customers. The bad news is that we’re in fourth place behind their other three database choices (MySQL, PostgreSQL, and MongoDB). However, MariaDB has only been available on their platform for a little over two months and we’re very happy that so many users of Jelastic’s state-of-the-art Java cloud are choosing to use MariaDB. Thanks!
In the second blog post, Michal Hrušecký shares the results of what he terms his “little MySQL survey”. This time MariaDB comes in second behind MySQL Community Server (and ahead of MySQL Cluster and Percona Server). In case you didn’t know, Michal packages MySQL and MariaDB for openSUSE, so this survey was a good way for him to judge the relative popularity of some of the more popular variants and give him ideas on how to improve things.
BTW: If you know of other interesting/informative/etc… blog posts related to MariaDB, let us know so we can add them to the Blog Posts Relevant to MariaDB page of the AskMonty Knowledgebase!
PlanetMySQL Voting: Vote UP / Vote DOWN