Archive for the ‘PyObjC’ Category

kajtajm: Planning my first Python / PyObjC / MySQLdb project

Январь 17th, 2010

Never have I planned a program so much in detail before starting coding. And stranger still, I’m planning the program to be coded in a language in which I haven’t written a single line of code. Nonetheless, I’m eager and it feels good.

I woke up early this Sunday, my mind concentrating on the many steps I will have to take before the first project will see the light of day. I had planned eight prerequisite lessons before even starting the coding. And despite receiving many good pieces of advice, nobody had relieved me of my main worry — the input grid for the data entry in matrix format.

To speed up things, I thought of writing some auto-generated INSERT statements from the current Google Spreadsheet that I’ve been using. I would then at least have the basic data from the first two weeks of the year to play around with in my MySQL database.

And then it struck me: I don’t need a grid at all, ever. I’m much better served by a mere text field.

Look at how I record my time bookings right now as I go, in a flat file in the standard Mac text editor (a bit sloppily when it comes to indentation):

Sun 17.1.2010
0700-0730 PyObjC Thought of how to get rid of the grid
   -0800 kajtajm Planned the object structures
    -0830 routine Showered and dressed
    -0845 PyObjC Planned a blog entry

This is an extremely fast way of booking time. Why wouldn’t I use the same simple syntax in a Cocoa text box instead of a grid? Using a multi-line text box for input will relieve me of finding a grid, learning it, and fighting it to follow my needs. And I’d like it to behave like a text field, anyway. The pretty straight column formatting of a grid is a forsakable luxury for me.

I would assume Python is easily able to parse my sloppily indented text and enter it properly into my MySQL table, which I plan to look like this:

CREATE TABLE kajtajmrow (
 kajdate  DATE,
 timefrom SMALLINT(4) ZEROFILL,
 timeto   SMALLINT(4) ZEROFILL,
 project  CHAR(10),
 kajtext  CHAR(40),
 hours    DECIMAL(8,2),
 keyword  CHAR(10),
 PRIMARY KEY(kajdate, timefrom));

I already made a couple of technical decisions in writing my CREATE statement.

  1. Field naming. Short names. To get rid of reserved words, I prefix the fields in question with “kaj”, as it’s then clear the name is something I came up with, and as it relieves me of inventing supposedly more descriptive names that are so long they render my SELECT output hard to read. I think the end result is descriptive enough and will minimise distraction and misunderstandings later on when I code and work with the MySQL client command line.
  2. Time formats. I hardly expect to do much insightful statistics based on the time of day, so I reduced the times to convenient input devices. I avoid the hassle of converting times to and from a TIME format that in this case pleases only the computer. I made them SMALLINT(4) to reduce space consumption which may give faster indexing when I load all my thousands of time bookings from the 1980s and 1990s, and I added ZEROFILL as my eyes recognise 0815 as a time quicker than 815. Besides, I intend to strictly enforce times ending in 00 15 30 or 45, so I will need to make considerations for checking the format anyway.
  3. Hours format. The DECIMAL(8,2) gives me a maximum of 999999.99 hours, and if I track on the average of 3000 hours a year, this should do it for the next 300 years. The field is able to represent hours that I don’t want, such as 3.14 h, but that’s a blow that I’m willing to take. Worse, I will want to report hours mostly as integers, so I will need to format them upon output. At times in reports, I may wish to use the fractions present in most fonts, such as 3½, 2¼, 5¾. That’s as accurate as 3.5, 2.25 and 5.75 but consumes less space and looks more similar to 3, 2 and 5. Luckily, hours just a computational, non-editable field derived from timefrom and timeto. Still, I decided to store it, as the parsing rule for calculating hours from timefrom and timeto is more easily implemented in Python upon editing, than in MySQL upon reporting (where it’s doable but cumbersome).
  4. Text fields and formats. I chose char(10) for the main project field. In ten chars, allowing mixed entry, I can quickly enter a fairly descriptive project code. The second text field is the free-form text explaining what I’ve done, where char(40) can contain sufficient text to explain what I’ve accomplished, without spending too much horizontal space on the screen or in a report (and if I need to explain in more detail what I did in an two-hour booking, I split it into two one-hour bookings). Finally, as longer projects have subgoals or subtasks, I added another redundant computational field in addition to hours: keyword. I will parse it in Python to be the first word of kajtext, again expecting simplified SELECT syntax when reporting. Sure, hours and keyword are strictly speaking redundant fields. But I can easily check for errors, and I expect the savings in clarity and lack of redundancy in the code to far outweigh the drawbacks of redundancy in the data.
  5. The keys. Theoretically, the database structure allows for double bookings, with one task for 1015-1130 and another 1030-1115. I may need to write a cleanup utility to identify such double bookings later on.

With the database structure set, it’s time for planning the Python program. As I noted in a previous post, I need an input and an output screen.

For mere input of new time bookings, I don’t expect many complexities in the UI. I’ll enter bookings into an empty text box, either writing or pasting text according to the simple syntax in the example on top.

  • If data is valid and interpretable, then clicking on the [Import] command button will successfully parse user input and convert it into INSERT statements.
  • If data isn’t valid, I think nothing shall be entered into the database. Rows which don’t pass (1014 is not a valid time in kajtajm) should be prefixed with a “?”, perhaps with the slight modification that rows with too long values for the project or kajtext fields are prefixed with “!” for easy distinction and thus better usability. (The redundant keyword field I’ll just truncate after ten characters, as keywording lines is optional and all data is present in the kajtext field anyway).

The above method of giving user feedback avoids the necessity of any further error messages, that would merely clutter both my code and my user interface. As a user, I think I will be able to understand how to edit invalid rows and remove the “?” or “!” by hand before trying [Import] again.

For editing existing time bookings, it’s trickier. Yes, I can use mostly the same syntax as for pure entry. But to be able to edit, I won’t have only INSERTs but will need also UPDATEs and DELETEs. And I will need a mechanism for the end user to identify which rows to edit.

Selecting the rows to edit should be an easy thing. I can offer input fields for kajdate (a range), project and kajtext. With “%” in the project and kajtext input fields, I can make the search into a LIKE. And I probably also need a maximum rows input field (default perhaps 50), to limit the number of rows retrieed into the text field.

Rows that the user didn’t touch don’t need any action, so the first thing I plan to happen when the user presses the [Merge] button (it’s no longer a mere import) is to compare the rows that are in the text box at that point in time with the rows that were read into memory from the database and served as the basis for user editing in the text box. If there are no changes, no MySQL statements are needed and database traffic will ensue (as it would cost response time even if I am at localhost).

This is how I plan to treat a row (in the form of a parsed time entry) that somehow differs from what was read in from the database.

  1. A disappeared row. This would perhaps intuitively indicate a delete, and does mean that the end user has deleted the row from the inbox. But that may have been by mistake, after having painstakingly entered fifteen new rows. So if I allow deleting rows this easily, I’m probably in for involuntary deletes and low usability. Besides, the user might want to simplify editing of rows by cleaning away correct rows from the text field, thus eliminating clutter. Anyway, I have decided to disregard a discovery of a disappeared row, and solve the delete need by requiring explicit, conscious deletion of some kind (see the next item!).
  2. The primary key (i.e. combination of kajdate and timefrom) existed in the set that was read in from the database, but at least one other field has changed. This I should likely convert to a mere UPDATE of the non-key fields of timeto, project, kajtext, hours and keyword. The exception would be that the user consciously wants to do the aforementioned delete. The best way to do this, that I’ve come up with, is to enter a special project code of “d” as in “delete”. Sure, that prevents the usage of such a project, but I wouldn’t want to use it anyway, and most importantly, I don’t want to make it easy to delete rows by mistake.
  3. The primary key didn’t exist in the set that was read in from the database. That sounds like a clear INSERT case to me, and that’s what it is.

In the input screen, I will probably want to provide some semi-hard coded data entry statistics, such as the hours worked this year for each of the six top projects (however many I decide will fit on the screen), or the total number of hours and rows entered. Most importantly, the input screen should show the time booking with the highest kajdate - timefrom combination, to remind me of where I need to continue my data entry.

The real benefit of keeping track of time comes from interesting reports. And here, I’m really looking forward to slicing and dicing the data in Python, given its tuple concept that enables array computations (smells like APL to me!). But first, the report has to be defined. The definition of the screening criteria can be the same as for the editing screen. Sorting needs extra attention. Possible sorting and aggregation fields are year, month, week [1], date, project and keyword. The way I thought of doing this is to simply allow the end user to write a code like YMP (year-month-project), P (project), MD (month-date) or PM (project-month) in a text field. Remember, the user is me, so I don’t need a lot of UI distraction. In fact, i consider such a text field has a high usability.

The above report definition is quite flexible, but has its limitations. For one, I would have fixed numeric columns, and for the first kajtajm release, I will always provide two columns: hours and percentage of total hours. Later, I could conceive of matrix reports where I have different columns for different projects, keywords or days of week. Also, I can imagine longer, more descriptive project names and sort orders, but for now, that’s strictly a distraction.

But now for the fun part that I have missed since I drifted into management and gave up programming: the matrix slicing and dicing of the reporting cube, at will. The fun starts already with converting the selection criteria into the WHERE part of the SELECT, and the sorting fields to a GROUP BY, where the only complexity (and it’s a minor one) is to extract YEAR(), MONTH(), and WEEK() out of kajdate.

I then expect to read the entire SELECT statement into memory in one fell swoop, then do both subtotal calculations and report formatting from the memory tuples. Here, I’m not yet sure of the Python structures needed. I haven’t even coded my “Hellöu wörld” yet. But I think I need a dynamically defined object (forgive me that I probably don’t use the proper Python lingo [2]) — let’s call it rptrow — with one attribute for each sorting / aggregation field, one for the hours and one for the percentage. I’ll then loop through the SELECT answer set, and enter a tuple of rptrow into an array for each row in the answer set, plus an additional row each time when a non-leaf-level aggregation field changes (to make room for the subtotals). I plan to do the subtotalling on top, since I like to read the subtotals in subheaders, not subfooters. With the SELECT set finally read into memory including spare rows for subheaders, I just need for Python to compute the subtotals and percentages. Then, my report is done, although it’s still only in memory.

The final proof is in the output of the report.

  1. Flat file, fixed font. To see whether I’m on track, I should just dump the memory structure into a flat file or a text field for a fixed font. This should even be quite legible on-screen, and can be used as input for prettier reports in, say, Google Docs or an email.
  2. Flat file, comma separated. I will also want to pretty-format some reports with non-fixed fonts but still straight right-aligned columns for hours and percentages. That can be done in a spreadsheet. The easiest way to import is probably using a comma separated file (and it would likely suffice that I write the “file” into a text field, from which I can copy it to the clipboard and then a file).

Perhaps I want to pretty-format a report directly from kajtajm, generating a PDF? Hmm, I tend to think this should be left on the wish list, like the direct clipboard dump. But if I come up with a way to paste the report into Google Docs as a table, I’d be happy.

That should be it! Of course I still need to write the program, but it does seem a lot more manageable now, after this “programming by thinking while blogging” session.

[1] Weeks are numbered 1-52 (some years 1-53) in a standard way in Europe.

[2] I owe it to Patrick Crews, database QA engineer at Sun/MySQL and self-confessed Python lover, that I came this far. Without his inspiring and unsolicited explanation of some key Python concepts, I wouldn’t have felt comfortable enough to go ahead with this thinking.


PlanetMySQL Voting: Vote UP / Vote DOWN

Rediscovering Programming with Python, Cocoa and PyObjC

Январь 16th, 2010

Use PyObjC to code Python under the Mac Cocoa framework, and MySQLdb to interface with MySQL! That’s the advice I got in reply to my recent blog. The discussions were so inspiring that I spent all time when out running yesterday daydreaming about how wonderful it will be to rediscover the aesthetics of programming and regain the control over my personal IT.

So, to recap a bit of my thinking: Once a programmer, always a programmer. If you’ve learned how to code and to master your own IT life to a certain degree, you get used to it. At least I did. But then fate wrote a declaration of incapacity for me. Not only did I become a manager and got to do less and less coding. More aggravatingly, I changed operating systems. Gone are the days when I could solve a task requiring simple programming in FOCUS, Visual Basic or Delphi. This is not because of rust (if you know how to bike, you can still do it even after long breaks), but because I can’t run my familiar tools on my Mac. And as for Java, PHP and Perl, I never learned them nearly as well, nor are they languages with which I presumed to want to control specifically the Mac environment, with which I expect to stay for a while.

Hence, I wrote my blog “Developing small Mac apps with MySQL” yesterday. And I’m very grateful for the inspiration provided by my Sun/MySQL colleagues on the Sun/MySQL internal discussion forum — thank you! The reply that ignited my feelings the most was by Mark Leith of the MySQL Support Team. He suggested me to go for MyObjC, as a way to combine Python, Cocoa (the prime Mac OS X software framework) and MySQL. And of course, I googled and read up on the topic.

This is why Python and PyObjC feels like the right choice for me to rediscover programming:

  1. Full control of Cocoa. PyObjC isn’t overprotective; there isn’t a limit after which I can’t go further. It’s as native as it gets on Mac, unless I’d program directly in Objective C. While interpretive when coding (a plus), it can be compiled for the standalone app (another plus).
  2. Good MySQL integration. Python is one of the Ps in LAMP. And googling for “PyObjC MySQL” gave 51000 hits.
  3. Proper aura. Python sounds modern and cool. Casually mentioning “a program that I just wrote in Python” would impress my colleagues born in the 1960s; coding comfortably in Cocoa would make myself feel younger.

So, Python and PyObjC it is. The next question is how to interface with MySQL.

  1. MySQL Connector/Python. When Geert Vanderkelen, also of the MySQL Support Team, put in his general pitch for Python, he mentioned that MySQL Connector/Python (myconnpy) is on the rise. Geert is the main contributor to myconnpy. However, going to https://launchpad.net/myconnpy I learned it’s still early alpha.
  2. SMySQL framework. When googling for “Cocoa” and “MySQL”, I hit upon http://mysql-cocoa.sourceforge.net/ which seems to be the fundament for some apps. There seemed to be several versions of it, I found out when stumbling upon http://www.sequelpro.com/cocoamysql.html.
  3. MySQLdb. Googling seemed to indicate that http://mysql-python.sourceforge.net/MySQLdb.html would be the right way to run MySQL not just from Python in general, but also from MyObjC in the Cocoa environment. Stating that my priorities are the MySQL classic value trio of Stability, Performance and Ease of use, Mark Leith confirmed that I was on track.

This means I’m ready to start. But I’ve been a sufficiently long time away from coding that I want to be a tad more systematic and goal oriented than just hacking around in general. Besides, I couldn’t just sit down, open my favourite editor and start writing, even if I wanted. I don’t know the Python syntax. And perhaps I could benefit from the added perspective of ten years without coding but instead with lots of management, networking and communication?

So here’s how I plan to proceed gaining familiarity with PyObjC:

  1. Books. That’s how I learned most other computer languages. Georgi ‘Joro’ Kodinov,
    manager of the MySQL Server Team, recommended “Cocoa Programming for Mac OS X” by Aaron Hillegass and in Mark Leith’s link to the Apple Developer Connection, the page “Using PyObjC for Developing Cocoa Applications with Python” recommended O’Reilly’s Learning Python, Second Edition which I borrowed yesterday in German from my local library Gasteig here in Munich. There are plenty of other books, but those are the two I plan to concentrate on.
  2. Training. Learning by doing was the method I used successfully so far, so I won’t change that. What I will change, though, is going about it in a more systematic fashion, which is why I’ve identified my learning goals below.
  3. Itches. The programming projects I want to write and use are the driving force behind all of this, and I have a pent-up demand for Regaining Control of my IT life. I’ve also documented those very pet projects below.
  4. Mentor. I’d love to find a Python mentor who can get me over the hurdles quicker. Some types of patience grow with age, other types diminish. And the “let me try the umpteenth way of reinventing a wheel that I’m sure is out there” type of patience has diminished a lot from my teens, when it really was about personally discovering the world of coding. Ideally, I’d find a Python mentor in physical vicinity here in Munich, probably half my age. Yes, I’d return the mentoring favour in some wake of life where I’ve got more experience. Hmm, next Thursday there’s a user group mtg in the local university LMU. Let’s see if I can make it!
  5. Sharing. As I progress and learn, I need to document what I learn in order to keep on track and not let me be distracted. That’s what experience has taught me. And working with open source, I’ve learned that great things come out of sharing, so I might as well blog what I document. If I’m as successful as in my wishful thinking, I’ll put my pet projects up on sourceforge in the years to come.
  6. Web resources. Documenting the web resources involved in learning PyObjC could be seen as a subgoal of the sharing goal, but it’s a big one. What to download and where, pointers to documents, tutorials, add-ons, forums and google groups is important enough to receive both attention and documentation.

The driving force is this set of pet projects, my itches that I want to scratch:

  1. kajtajm. Time tracking (date, from-time, to-time, project, freeform explanation about 40 chars) with productive grid input. That was the trigger for me to write yesterday’s blog and it was also the one I was referring to in the “Three key requirements”, “Input Screen” and “Output Screen” sections.
  2. kajbook. Bookkeeping, for personal expenses, for small self-owned companies, for projects. I wrote an ancestor in FOCUS (so I have fairly clear specs in my head) but haven’t been able to use it for years, for lack of a PC/DOS (or VAX/VMS or VM/CMS) command prompt.
  3. kajmejl. Email archive app for all my emails sent and received since sometimes in the last century, when I started using email. Parse the files from various generations of email (pine, eudora et al.) into a MySQL database. Index it to make it searchable. Free space by deleting the biggest attachments. Enable tagging and/or searching on people, projects, customers (which tend to change names over time, as centuries pass by). Enable pretty-printing of individual emails, and of forwarding them to Google Mail.
  4. kajdisk. Order is a key goal for me for 2010, and I despice retaining old CDs, DVDs and harddisks just because they might contain files that I don’t have any other backup of. So kajdisk loads the directories of the main backup drive into a MySQL database, and then checks whether individual CDs/DVDs/HDs contain any additional files. The app lists them and the user can then copy them (if they’re small and/or necessary) or choose to disregard them (if they’re unimportant) in order to immediately be able to throw them away. I expect to gain free physical shelf space from using kajdisk.

Finally, my Python / PyObjC learning goals. These are the steps in which I plan to progress.

  1. Hellöu wörld. Reading a text (in ISO Latin-1, hence the döts above the ö) from a MySQL database and writing it correctly onto a Cocoa screen. Requires installation at least of MyObjC and MySQLdb, and learning how to communicate between MySQL and Python.
  2. Basic language constructs. Loops. Declaring variables. Classes. The bare necessities (I plan to learn more while scratching the itches).
  3. MySQLdb. Understanding key components of the Python / MySQL database interface: SELECT, UPDATE, INSERT, DELETE. Navigating answer sets.
  4. Basic Cocoa UI elements. This I expect to be a lot like Visual Basic and Delphi, with a slightly different way to do binding between objects and the UI.
  5. Mathematical functions. Python seemed to know how to do a sum of an array natively, almost like APL. This is promising, perhaps there’s more fun to learn.
  6. An input grid. This worries me. Grids (matrix tables) were always a cause of concern when I coded in various versions of Visual Basic and Delphi. Often, they lack capabilities. Often, they let you change irrelevant attributes, but not the relevant ones. Often, trivial things require lots of workarounds. Let’s see about grid usability, both from a user and a coder perspective. Hmm, I suspect finding the right UI element to use will be the first hurdle.
  7. File system. How do I loop through the file system and retrieve all file attributes? Can I perhaps retrieve (or even write) EXIF attributes of JPG files? That would surely be a bonus.
  8. Regular expressions. I know how to grep and find, roughly, but I’ve always had to ask Giuseppe Maxia (and earlier David Axmark) for help on regular expressions. I suspect parsing through texts would benefit a lot if I learned regexps properly. But learning this isn’t a prerequisite for at least kajtajm, kajbook and kajdisk. Sure, kajmejl may require regexp knowledge.
  9. Clipboard, file export, PDF output. Those are things that aren’t requirements for the initial use of any of my pet project apps, but they clearly could help.

That’s it! I’m looking forward to experience my rebirth as a programmer during the course of 2010.

Links:


PlanetMySQL Voting: Vote UP / Vote DOWN