Archive for the ‘Information’ Category

OpenDBCamp: Information Lifecycle Architecture

Май 7th, 2011
The Open DB Camp in Sardinia 2011 has had a number of sessions on varying topics. Topics range from MySQL over MongoDB to replication and High Availability.

I decided to tap into the database expert resources present here at Sardegna Ricerche by discussing a non-database issue, where one can expert database experts to have insights beyond those of end users. And they did.

The topic was the particular case of information overload many of us suffer from on our hard disks: Too many files, too hard to find.
  • How do we find the bank statement from April 2007 from the more-seldom-used account?
  • What are the ten best work-related pictures from last year?
  • Is this the most current version of the presentation of BlackRay?
  • Are these films from Cagliari already backed up? Also offsite?
It turned out that I am not the only one suffering from a slight chaos on my hard disk. We all have some basic discipline we try to follow to keep things in order, but the consensus seemed to be that disorder on the hard disk is a psychological problem to be solved by good habits, more than a technical problem to be solved by an application. This in itself is a revolutionary insight, to come from a bunch of techies.

Before going into the individual points, let me first share how I had framed the discussion:
Many OpenSQLCamp attendees spend lots of time communicating about our SQL projects, internally and externally. We spend lots of time architecting database systems, and managing the lifecycle of products.

We do little to implement a proper architecture for the non-database information we create and manage, in business and privately. We drown in emails, digital pictures, versions of downloaded PDF documents, video snippets, and attachments sent by colleagues, partners and private friends. Chaos ensues.

Disorder and low productivity are inevitable unless we are very disciplined in following some basic rules for keeping order on our hard disks, pods and pads
. But what are those basic rules? And what tools can implement them?

I don't sit in with more than a rough first sketch of "an Information Lifecycle Architecture", but I'd like to share ideas, thoughts and attitudes with my fellow OpenSQLCamp attendees. I'll present some slides and guidelines, and will make an attempt at collecting your thoughts into a summary afterwards!
I threw in a couple of basic ideas on how to handle the type of information that we have to manage as individuals, usually on our own hard disks:
  1. Separate /pub from /rep: Store raw information in its original form in one directory tree, the "repository". Store distilled information ready to be consumed in a separate directory tree, the "publications".
  2. Limit the allowed /pub formats: Allow very few formats for publishing (such as .jpg .mov .pdf .mp3 .ogg but not .doc .ppt .xls .cr2 .psd .oo3 or anything even more "exotic").
  3. Delete systematically: Don't save many versions of the same file. Don't save information that isn't needed.
  4. Sync easily: Set up the directories (and configure your software) so that it's very easy to sync the published files with your mobile devices (Androids, iPhones, iPads, iPods, digi frames), regardless if PDFs, JPGs, MOVs or MP3s.
  5. Order files by type: Above /pub and /rep, separate files by rough category: Pictures, Movies, Documents, Music.
  6. Order files by year: Under /pub and /rep, separate most files into directories by year. Month or quarter would be too frequent for most personal information.
  7. Order files by common sense: Under the year (or in exceptional cases directly under /pub or /rep), separate files by placing them into a smart directory structure, which you yourself decide about according to the topic, as opposed to delegating the file structure to the random preferences of some software (like iPhoto).
Beat Vontobel, Liz van Dijk, Markus Popp, Sheeri Kritzer Cabral, Sergei Golubchik, René Cannao and others came with very good ideas and anecdotes. Let me here relate some of them, while they're in fresh memory:
  1. Blog your notes! Write your personal notes so that they're reusable for others. Publish them on your blog. Then you can use Google to find your own notes. I think this tip is smarter than what it sounds at first, i.e. it's applicable for quite a few situations.
  2. Use version control! For some who are familiar with version control anyway, it may make sense to put presentations and various types of other personal information into a version control system.
  3. Use the cloud! Put some of the information onto the cloud, for easy availability across machines, for easy synching, for backup.
  4. Tags for fields should be part of the operating system. You could tag expense reports, notes, contacts, pictures, films, documents and emails alike with #opendbcamp. The tagging should ideally work across operating systems.
  5. Order needs discipline. Any good habit of keeping order on the hard disk needs to be backed up by a commitment in time. If you slip once, and twice, and one more time, the discipline is lacking.
  6. Storage is cheap. Or is it? Here I noted two schools of thought. One would rather just tag anything and keep order by sorting. The other school would rather delete as much as possible, so that the remainder is smaller and hence easier to keep ordered. I belong to the latter one.
  7. Bad banks throw important yet unstructured information at you. You can get a bank account statement with a long filename which doesn't denote the year and month or bank account. You yourself have to parse the file, and name it properly. That's a burden even for a geeky OpenDBCamp visitor. Think of the poor average bank customers!
  8. The analog world forced you to have a physical relationship to your data. In order to use your CDs or spices or books, your mental maps of organising them were backed up by some physical structure. This physical structure is missing from digital data. It becomes easier to forget that you even have the information. We end up with a lot of pictures, music and videos we never use.
  9. Use Yojimbo http://www.barebones.com/products/yojimbo/ as an information organiser, if you're a Mac user.
  10. Does technology solve issues or create them? Earlier, we didn't have as many pics, films, CDs or books. Now, we have more of them, in a variety of forms. Does it really make sense to spend tens of hours sorting and otherwise maintaining your collections (of films, music, pictures)? Or is it better to have smaller collections, even of the seemingly "free" items such as digital pictures and films taken by yourself?
On that philosophic observation, let me end my personal notes from the "Information Lifecycle Architecture" session at the Open DB Camp, which I have now published and will be able to find later on by Googling it.

PlanetMySQL Voting: Vote UP / Vote DOWN

MySQL related bookmark collection

Сентябрь 17th, 2009
I am publishing my MySQL related bookmark collection http://www.mysqlpreacher.com/bookmarks/. Feel free to send me links you think might be good to add in order to help others. Remember, SHARING IS CARING!!! …. we get so much for free, why shouldn’t we give some back? Cheers, Darren
PlanetMySQL Voting: Vote UP / Vote DOWN