Archive for the ‘cassandra’ Category

Cassandra and Ganglia

Сентябрь 4th, 2010
cassandra_tpstats_row_read_stage_completed

I finally got some time to do some house cleaning. One of my nagging low-hanging fruit stuff was stop running jconsole on one screen to see the state of all my cassandra boxes. I created a ganglia script to graph what is above. Above I am showing all the cassandra servers and their total row read stages as a gauge. Meaning that basically I am graphing the delta of the change between ganglia script runs. This gives me the reads over time based on deltas between runs.


How I have it set up is:

All data exposed by JMX to produce tpstats and cfstats is graphed via ganglia. The pattern for each graph is as follows

cass_{stat_class}_{key}

stat_class - tpc, tpp, tpa means complete, pending, active respectively
key - would be message deserialization for instance.

For column family stats I graph the keyspace stats as well as the specific column family stats exposed by cfstats. For instance below:

cassandra cfstats with ganglia

If your interested in the scripts I'll send it to you or put it up on code.google.com, its written in perl OOP perl and takes the same approach of packaging that maatkit tool kit for mySQL by Xarb and crew does (puts all the "classes" in the file as the application).

GmetricDelegate is the parent package
GmetricCassandra extends GmetricDelegate and overloads getData as well as defines what is an absolute stats vrs a gauge.

As you can see the pattern I also have
GmetricInnoDB
GmetricMySQL

and so on.

then on each server I run

/usr/bin/perl -w /home/scripts/ganglia_gmetric.pl --module=GmetricCassandra

this then talks to Ganglia through gmetric to report the stats.

PlanetMySQL Voting: Vote UP / Vote DOWN

Cloud openness contemplated

Апрель 15th, 2010

I caught some of the keynotes and discussion at the Linux Foundation Collaboration Summit today, and was particularly interested in the panel discussion on open source and cloud computing. While we are used to hearing and talking about how important open source software is to cloud computing (open source giving to cloud computing), moderator John Mark Walker posed the question of whether cloud computing gives back? The discussion also rightfully focused on openness in cloud computing, how open source might or might not translate to cloud openness and the importance of data to be open as well.

The discussion also centered on some issues regarding open standards and how open is open enough for cloud computing? It may depend on who you ask, but I tend to think that the flexibility, interoperability and portability advantages of open source software will dictate its continued use and true openness in the cloud.

However, this is not always the case. When we consider openness in the mobile market, we see that while open source software is going into more and more smartphones and mobile devices, by the time it gets into the product and into the hands of consumers, it ends up closed. This is not necessarily a violation of open source license, either in rule or in spirit, but rather the use, incorporation and reliance on open source alongside proprietary products, strategies and companies, typically under a permissive license. Much of it also has to do with the need, both perceived and real, for control of code in these devices among hardware, software, wireless carrier and other players with a stake.

Another interesting perspective of what open source means, or doesn’t mean, in terms of cloud computing, standards and interoperability comes from the Xen community’s Simon Crosby of Citrix.

One of the most interesting things to watch when considering whether cloud computing gives back to open source is the AGPLv3 license, which is viewed in different ways as both a burden and a boon to network-based, distributed development by various parties. We continue to see vendors, such as mobile software player Funambol, as strong supporters of AGPL while others, such as Google, continue their resistence to it.

The AGPL also came up in the Linux Foundation Collaboration summit panel again, and while I don’t think the license currently serves as the answer to whether cloud computing gives back to open source, we do see some benefits to open source from cloud computing, both in terms of code, projects and communities and the commercial vendors leveraging open source software. In terms of code, large users of open source software projects, such Linux, MySQL, Hadoop, Cassandra, help to raise the profile and credibility of open source. Whether corporations or university campuses, these large users can also be among the most active community participants — driving features and shaking out bugs, and most prolific code contributors — creating features and extensions and enlarging the ecosystem. In terms of commercial open source vendors, cloud computing can also mitigate the challenges of balancing and differentiating free, community versions and separate, paid versions. If the vendor is able to offer support, services or even extensions with the cloud version of its software, it is easily separated from a free, community version that may be available for free, but not from the cloud.

Of course, there is more that cloud computing can do for open source and there is much more that has to be done to ensure true openness in cloud computing, particularly when some existing and emerging defacto standards are anything but open, but for all that open source is to cloud computing, cloud computing seems to be returning the favor to some degree already.


PlanetMySQL Voting: Vote UP / Vote DOWN

Cassandra is my NoSQL solution but..

Март 24th, 2010
In the past few months, I have tested many NoSQL solutions. Redis, MongoDB, HBase yet Cassandra is the Column Store DB I picked because of its speed (on writes), reliability, built in feature set that makes it multi-datacenter aware. The one other personal reward for Cassandra is it is written in Java. I like reading and writing in Java more than C++ although it really does not matter for me personally in the end.

Let us talk about the reason why I am introducing Cassandra into my infrastructure and some of its drawbacks I have noticed so far.

Why it is being introduced:
We have a feature where we record every single click for 50 million Monthly Active Users (real-time) and storing this in mySQL is just waste of semi-good hardware for data that is only looked at for the past 24 hours. Over the course of some time (couple of months) more than 3 billion rows accumulated, which translated into a 3.5 TB distributed INNODB datafile. So purging/archiving this data just sucks.

Now introducing user clicks into Cassandra was rather easy. I researched various sites, asked my Digg buddy and then figured out the rest. Within two days I was up and running with Cassandra and had a great understanding about Column Families and SuperColumns.

Developing with a Cassandra Data layer:

Now that the infrastructure is up, I needed to add a data model to /opt/cassandra/conf/storage.conf. The SQL that drove this functionality consisted of two main SQL statements.
Add click
Get a Range of Clicks

For these operations, mySQL rarely takes 0.001 seconds (1 ms).

Cassandra for writes is rather fast, but for reads, Getting Range Clicks
i.e. using
recv_get_slice

it takes .02 seconds (20 ms).

What does this mean? MySQL is a hell of a lot faster! Is it because of my CF design? No.
for instance, take this slug (keys separated by a delimiter to make a distinct key name).

{$clicker}_{$pageowner} => [ $object_id_clicked0 => $value,$ts, ... $object_id_clicked =>N $value,$ts ]


The slug says that clicker A, clicked on page B and what is stored are columns of distinct clicks - remember this is a column store DB.

So
{$clicker}_{$pageowner} == the KEY for the COLUMN $object_id_clicked with the value being a $value and the free extra value the $timestamp.

Here is the php code

$columnPath = $this->getColumnPath($objectid);
$this->getCassandraConnect()->insert(self::KEYSPACE, $this->getKey($clicker, $pageOwner), $columnPath, $this->today_ts, microtime(true), ConsistencyLevel::ZERO);



Now I want a list of items clicked.

$data = $this->getCassandraConnect()->get_slice(self::KEYSPACE, $this->getKey($clicker, $pageOwner), $this->getColumnParent(), $this->getSlicePredicate(), ConsistencyLevel::ONE);


This says give me the last N logged clicks that the clicker A made for page owner B. This is a hash lookup (Big-O(1)) but a sorted list of columns (O(nlogn)) and return the last N elements.

Why is Cassandra sooo slow on reads. Is it because my memory config is not enough?
No. 7 GB of data is allocated for data that fits in memory ( for now ).


Is it because my disk is saturated?

avg-cpu: %user %nice %system %iowait %steal %idle
12.21 0.00 2.85 0.48 0.00 84.47

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.01 14.01 0.23 6.61 22.25 165.03 27.34 0.33 47.57 7.35 5.04
sda1 0.00 0.00 0.00 0.00 0.00 0.00 20.83 0.00 7.74 6.40 0.00
sda2 0.01 14.01 0.23 6.61 22.25 165.03 27.34 0.33 47.57 7.35 5.04
dm-0 0.00 0.00 0.24 20.63 22.25 165.03 8.97 0.72 34.27 2.41 5.04
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 5.28 0.22 0.00



No its not.

My hunch is the slowdown is a combo of the low thread read pool and in the inherent sorting and there doesn't seem to be a way to turn it off, so without digging in the code I couldn't say (I will be able to once I get an understanding of the code layout).


In conclusion, I like Cassandra, it’s very fast in writes, slow (for my taste but fast enough) in reads and what takes 2 lines of SQL takes 250 lines of PHP code interfacing with the Cassandra.thrift suite. I am going to use it still because it is good enough and I love the built in HA of it. Additionally the performance is improving very quickly.

PlanetMySQL Voting: Vote UP / Vote DOWN

Log Buffer #182, a Carnival of the Vanities for DBAs

Март 12th, 2010

This is the 182nd edition of Log Buffer, the weekly review of database blogs. Make sure to read the whole edition so you do not miss where to submit your SQL limerick!

This week started out with me posting about International Women’s Day, and has me personally attending Confoo (Montreal) which is an excellent conference I hope to return to next year. I learned a lot from confoo, especially the blending nosql and sql session I attended.

This week was also the Hotsos Symposium. Doug’s Oracle Blog has a series of posts about Hotsos. If all this talk about conferences has gotten you excited, Joshua Drake notes that 14 days and the hotel is almost full for postgresql conference east which is March 25th-28th in Philadelphia. And the Oracle database insider notes that the Oracle OpenWorld call for papers is now open.

According to Susan Visser this week (ending tomorrow) is also read an e-book week. So if you have not already done so, read an e-book! She links a coupon for an e-book in the post.

Craig Mullins notes that the mainframe is a good career choice in Mainframes: The Safe IT Career Choice. He notes that the mainframe is still not dead:

People having been predicting the death of the mainframe since the advent of client/server in the late 1980s. That is more than 20 years! Think of all the things that have died in that timespan while the mainframe keeps on chugging away: IBM’s PC business, Circuit City, Koogle peanut butter, public pay phones, Johnny Cash… the list is endless.

In other career-related news, Antonio Cangiano is looking for [2] top-notch student hackers for a 16-month internship at IBM in Toronto starting in May. All the details, including how to apply, are in Cangiano’s blog post.

Willie Favero wants to know how you “solve the batch dilemma” for issues like “shrinking your batch window, designing your batch to play nicely with … OLTP” in how’s your batch workload doing? Perhaps Favero should read the updated batch best practices posted by Anthony Shorten.

Bryan Smith surveys a more personal question by asking if you go both ways and “manage both DB2 for Linux, UNIX, and Windows and DB2 for z/OS” in don’t ask, don’t tell, bi-platform DBAs. This week’s Log Buffer editor admits to being a tri-platform DBA — she has tried many platforms, and in fact, many databases (MySQL, Oracle, DB2, SQL Server, Sybase, Postgres and Ingres)!

Hari Prasanna Srinivasan promotes a patching survey in Oracle really wants to hear from you! Patching Survey.

Henrik Loeser explains what a deadlock and a hot spot are by using a real life analogy taken from a police report in deadlock and hot spot in real life.

Jamie Thomson asks why do you abbreviate schema names?. Shlomi Noach tries to solve the issue that “there is no consistent convention as for how to write [about table aliases in] an SQL query” in proper sql table alias use conventions. Noach also gives us a tip: faster than truncate.

Leons Petrazickis reminds us that “rulesets are chains” and it is important to have your rulesets in the proper order in iptables firewall pitfall.

Anyone interested in the history of MySQL AB will be informed after reading Dries Buytaert’s article.
Gavin Towey shares his software that helps centrally manage 120 MySQL servers in qsh.pl: distributed query tool For those who want to learn more about column-oriented databases, particularly in MySQL, Robin Schumacher of the InfiniDB blog announces that there is a MySQL University session recording on MySQL column databases now available. MySQL join-fu expert Jay Pipes has moved his blog to www.joinfu.com and starts with An SQL Puzzle and of course a follow up on the sql puzzle.

Ivan Zoratti is happy that finally, slides posted for the MySQL DW breakfast. Venu Anuganti gives you tips on one of the most common MySQL frustrations: optimizing subqueries in how to improve subqueries derived tables performance. Justin Swanhart posts the way in which he Gets Linux performance information from your MySQL database without shell access and emulates a ‘top’ CPU summary using /proc/stat and MySQL using the same method.

The Oracle Apps blog has an introduction to Oracle user productivity kit (UPK). Even though in this editor’s opinion the article is very sales-pitchy, it has valuable information, and does indeed live up to its promise:

UPK is a software tool that can capture all the steps in a system process. It records every keystroke, every click of the mouse, each menu option chosen and each button pressed. All this is done in the UPK Recorder by going through the transaction and pressing “printscreen” after every user action. From this, without any further effort from the developer, UPK builds a number of valuable outputs.

Allen White gives a great tip on how to optimize queries in keep your data clean.

Mike Dietrich reminds you to remove “old” parameters and events from your init.ora when upgrading, “as keeping them will definitely slow down the database performance in the new release.” He shows evidence of slowness when this is not done. Dietrich also shows how you can be gathering workload statistics “to give the optimizer some good knowledge about how powerful your IO-system might be”, especially “a few days after upgrading to the new release…while a real workload is running.”

Brian Aker shows the exciting features coming soon in Drizzle in Drizzle, Cherry, Roadmap for our Next Release.

Maybe you are thinking of migrating, not upgrading…..The O’Reilly Radar shows how to asses an Oracle to MySQL migration in MySQL migration and risk management. Actually, that article interviews Ronald Bradford on the subject — Bradford has been prolific lately, updating free my.cnf advice series and “Don’t Assume”: MySQL for the Oracle DBA series. Nick Quarmby also talks about migrating Oracle, but not to a new database, just to a new platform, in his primer on migrating Oracle Applications to new platforms. And the big news comes from Carlos of dataprix that Twitter will migrate from MySQL to Cassandra DB.

Paul S. Randal explains his way of benchmarking: 1 Tb table population on SQL Server.

Pete Finnigan shares his slides from a webinar on how to secure oracle, and Denis Pilipchuk shares his approaches for discovering security vulnerabilities in software applications.

Jeff Davis shares his thoughts about scalability and the relational model. Robert Treat responds actually, the relational model doesn’t scale and Baron Schwartz counters with NoSQL doesn’t mean non-relational.

Buck Woody explains “whenever you want to know something about SQL Server’s configuration, whether that’s the Instance itself or a database, you have a few options” — and of course what those options are — in system variables, stored procedures or functions for meta data.

This week’s T-SQL Tuesday topic was I/O. There are many links to great blog posts in the comments; three random posts I chose to highlight: Michael Zilberstein talks about IO capacity planning, while Kalen Delaney talks about using STATISTICS IO in I/O, you know, and Merrill Aldrich chimes in with information on real world SSD’s. Aldrich also begs folks not to waste resources and make more work for developers and DBAs in dear ISV, you’re keeping me awake nights with your VARCHAR() dates.

And we end with a bit of fin: Paul Nielsen wants us all to have a bit of fun; he has posted an SQL limerick and asks readers to create there own in there once was in Dublin a query.


PlanetMySQL Voting: Vote UP / Vote DOWN

Liveblogging at Confoo: Blending NoSQL and SQL

Март 11th, 2010

Persistence Smoothie: Blending NoSQL and SQL – see user feedback and comments at http://joind.in/talk/view/1332.

Michael Bleigh from Intridea, high-end Ruby and Ruby on Rails consultants, build apps from start to finish, making it scalable. He’s written a lot of stuff, available at http://github.com/intridea. @mbleigh on twitter

NoSQL is a new way to think about persistence. Most NoSQL systems are not ACID compliant (Atomicity, Consistency, Isolation, Durability).

Generally, most NoSQL systems have:

  • Denormalization
  • Eventual Consistency
  • Schema-Free
  • Horizontal Scale

NoSQL tries to scale (more) simply, it is starting to go mainstream – NY Times, BBC, SourceForge, Digg, Sony, ShopWiki, Meebo, and more. But it’s not *entirely* mainstream, it’s still hard to sell due to compliance and other reasons.

NoSQL has gotten very popular, lots of blog posts about them, but they reach this hype peak and obviously it can’t do everything.

“NoSQL is a (growing) collection of tools, not a new way of life.”

What is NoSQL? Can be several things:

  • Key-Value Stores
  • Document Databases
  • Column-oriented data stores
  • Graph Databases

Key-Value Stores


memcached is a “big hash in the sky” – it is a key value store. Similarly, NoSQL key-value stores “add to that big hash in the sky” and store to disk.

Speaker’s favorite is Redis because it’s similar to memcached.

  • key-value store + datatypes (list, sets, scored sets, soon hashes will be there)
  • cache-like functions (like expiration)
  • (Mostly) in-memory

Another interesting key-value store is Riak

  • Combination of key-value store and document database
  • heavy into HTTP REST
  • You can create links between documents, and do “link walking” that you don’t normally get out of a key-value store
  • built-in Map Reduce

Map Reduce:


  • Massively parallel way to process large datasets
  • First you scour data and “map” a new set of dataM
  • Then you “reduce” the data down to a salient result — for example, map reduce function to make a tag cloud: map function makes an array with a tag name and a count of 1 for each instance of that tag, and the reduce tag goes through that array and counts them…
  • http://en.wikipedia.org/wiki/MapReduce

Other key-value stores:

Document Databases


Some say that it’s the “closest” thing to real SQL.
  • MongoDB – Document store that speaks BSON (Binary JSON, which is compact). This is the speaker’s favorite because it has a rich query syntax that makes it close to SQL. Can’t do joins, but can embed objects in other objects, so it’s a tradeoff

    • Also has GridFS that can store large files efficiently, can scale to petabytes of data
    • does have MapReduce but it’s deliberate and you run it every so often.

  • CouchDB
    • Pure JSON Document Store – can query directly with nearly pure javascript (there are auth issues) but it’s an interesting paradigm to be able to run your app almost entirely through javascript.
    • HTTP REST interface
    • MapReduce only to see items in CouchDB. Incremental MapReduce, every time you add or modify a document, it dynamically changes the functions you’ve written. You can do really powerful queries as easy as you can do simple queries. However, some things are really complex, ie, pagination is almost impossible to do.
    • Intelligent Replication – CouchDB is designed to work with offline integration. Could be used instead of SQLite as the HTML5 data store, but you need CouchDB running locally to be doing offline stuff w/CouchDB

Column-oriented store


Columns are stored together (ie, names) instead of rows. Lets you be schema-less because you don’t care about a row’s consistency, you can just add a column to a table very easily.

Graph Databases


speaker’s opinion – there aren’t enough of these.
Neo4J – can handle modeling complex relationships – “friends of friends of cousins” but it requires a license.

When should I use this stuff?





If you have:Use
Complex, slow joins for an “activity stream”Denormalize, use a key-value store.
Variable schema, vertical interactionDocument database or column store
Modeling multi-step relationships (linkedin, friends of friends, etc)Graph

Don’t look for a single tool that does every job. Use more than one if it’s appropriate, weigh the tradeoffs (ie, don’t have 7 different data stores either!)

NoSQL solves real scalability and data design issues. But financial transactions HAVE to be atomic, so don’t use NoSQL for those.

A good presentation is http://www.slideshare.net/bscofield/the-state-of-nosql.

Using SQL and NoSQL together


Why? Well, your data is already in an SQL database (most likely).

You can blend by hand, but the easy way is DataMapper:
Generic, relational ORM (adapters for many SQL dbs and many NoSQL stores)
Implements Identity Map
Module-based inclusion (instead of extending from a class, you just include into a class).

You can set up multiple data targets (default is MySQL, example sets up MongoDB too).
DataMapper is:

  • Ultimate Polyglot ORM
  • simple r’ships btween persistence engines are easy
  • jack of all, master none
  • Sometimes perpetuates false assumptions –
  • If you’re in Ruby, your legacy stuff is in ActiveRecord, so you’re going to have to rewrite your code anyway.

Speaker’s idea to be less generic and better use of features of each data store – Gloo – “Gloo glues together different ORMs by providing relationship proxies.” this software is ALPHA ALPHA ALPHA.

The goal is to be able to define relationships on the terms of any ORM from any class, ORM or not
Right now – partially working activeRecord relationships
Is he doing it wrong? Is it a crazy/stupid idea? Maybe.

Example:





NeedUse
Assume you already have an auth systemit’s already in SQL, so leave it there.
Need users to be able to purchase items from the storefront – Can’t lose transactions, need full ACID complianceuse MySQL.
Social Graph – want to have activity streams and 1-way and 2-way relationships. Need speed, but not consistencyuse Redis
Product Listings — selling moves and books, both have different properties, products are pretty much non-relationaluse MongoDB

He wrote the example in about 3 hours, so integration of multiple data stores can be done quickly and work.


PlanetMySQL Voting: Vote UP / Vote DOWN

Getting started with Cassandra

Февраль 24th, 2010

With the motivation from today’s public news on Twitter’s move from MySQL to Cassandra, my own skills desire following in-depth discussions at last November’s Open SQL Camp to consider Cassandra and yesterday’s discussion with a new client on persistent key-value store products, today I download installed and configured for the first time. Not that today’s news was unexpected, if you follow the Twitter Engineering Open Source projects you would have seen Cassandra as well as other products being used or evaluated by Twitter.

So I went from nothing to a working Cassandra node in under 5 minutes. This is what I did.

  1. While I knew this was an Apache project, a Google Search yields for me the 3rd link for the The Apache Cassandra Project at http://incubator.apache.org/cassandra/. Congrats for Cassandra now a top level Apache Project. This url will update soon.
  2. Download Cassandra. Hard to miss with a big green button on home page. Current version is 0.5
  3. I read Getting Started, which is the 3rd top level link on menu after Home and Download. Step 1 is picking a version which I’ve already done, Step 2 is Running a single node.
  4. The Getting Started indicated a problem on Mac OS X for the required minimum Java version. I was installing on Mac OS X 10.5 and CentOS 5.4. I’ve experienced this Java 6 default path issue before. Set my JAVA_HOME and PATH accordingly (after I updated the wiki with correct value)
  5. I extracted the tar file, changed to the directory and took at look at the README.txt file. Yes, I always check this first with any software and relevant because it includes valuable instructions on creating the default data and log directories.
  6. Start with bin/cassandra -f. No problems!
  7. I then followed the instructions from the link in Step 2 with the CassandraCli. This tests and confirms the installation is operational.

Ok, a working environment. I’ve now installed on a second machine and tested however I now need to configure the cluster, and the documentation is not as straightforward. Time to try out Google again.

On a side note, this is one reason why I love Open Source. I followed the instructions online and found a mistake in the Mac OS X path, I simply registered and corrected providing the benefit of my experience for the next reader(s).


PlanetMySQL Voting: Vote UP / Vote DOWN

Observations on key-value databases

Сентябрь 21st, 2009

Key-value databases are catching fire these days. Memcached, Redis, Cassandra, Keyspace, Tokyo Tyrant, and a handful of others are surging in popularity, judging by the contents of my feed reader.

I find a number of things interesting about these tools.

  • There are many more of them than open-source traditional relational databases. (edit: I mean that there are many options that all seem similar to each other, instead of 3 or 4 standing out as the giants.)
  • It seems that a lot of people are simultaneously inventing solutions to their problems in private without being aware of each other, then open-sourcing the results. That points to a sudden sea change in architectures. Tipping points tend to be abrupt, which would explain isolated redundant development.
  • Many of the products are feature-rich with things programmers need: diverse language bindings, APIs, embeddability, and the ability to speak familiar protocols such as memcached protocol.
  • I think there are more solutions here than the ecosystem will support, and in five years a few will stand out as the most popular.
  • This process of paring down the gene pool is win-win because they’re open-source, and nothing will be lost.
  • Choosing which one to use is no easy task even for a highly skilled, technical, up-to-date person. Perhaps the decision-makers will choose on the availability of commercial support and consulting.
  • Many of them offer built-in, dead-simple, distributed, synchronous replication. This is very difficult to achieve with traditional relational databases. What makes key-value databases different? They don’t have MVCC, for one thing; but I’m not sure of the complete answer to that question, to tell the truth.

We live in interesting times.

Related posts:

  1. Analyze and optimize memcached usage with Maatkit Ryan poste
  2. A review of SQL and Relational Theory by C. J. Date SQL an
  3. Sessions of interest at MySQL Conference and Expo 2009 I haven

Related posts brought to you by Yet Another Related Posts Plugin.


PlanetMySQL Voting: Vote UP / Vote DOWN

Cassandra database and range scans

Август 6th, 2009
I've been doing a little more playing with Cassandra, an open source distributed database. It has several features which make it very compelling for storing large data which has a lot of writes:
  • Write-scaling - adding more nodes increases write capacity
  • No single point of failure
  • configurable redundancy
And the most important:

  • Key range scans

Key range scans are really important because they allow applications to do what users normally want to do:
  • What emails did I receive this week
  • Give me all the transactions for customer X in time range Y
Answering these questions without range scans is extremely difficult; with efficient range scans they become fairly easy (provided you pick your keys right).

Other distributed-hash-table database (e.g. Voldemort) don't do this. This makes it difficult to do such queries.

Conventional RDBMSs do range scans all the time, in fact many queries which return more than one row will be implemented as a range scan.

Cassandra is extremely promising, but still a little bit rough around the edges; I've only done a small amount of research so far, but already found several bugs.

I can't complain about the service though; the main developer(s) have always looked into any problems I've reported immediately.

I hope it continues and becomes something really good.

PlanetMySQL Voting: Vote UP / Vote DOWN