<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Dougma (dŭg·mə) n. &#187; django</title>
	<atom:link href="http://dougma.com/archives/category/django/feed" rel="self" type="application/rss+xml" />
	<link>http://dougma.com</link>
	<description>the truth according to Doug</description>
	<lastBuildDate>Mon, 03 Jan 2011 10:32:28 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>Blogging a project &#8211; The big picture</title>
		<link>http://dougma.com/archives/197</link>
		<comments>http://dougma.com/archives/197#comments</comments>
		<pubDate>Sat, 01 Jan 2011 07:30:53 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://dougma.com/?p=197</guid>
		<description><![CDATA[&#160; A dive into needs and wants and whys. Also see: Introduction &#8211; Lets make a bug tracker! Background &#8211; It&#8217;s not really a bug tracker! Time to get to some use cases, requirements, needs, wants, and technology. As mentioned in the last post, I need something to replace the features we currently have in [...]]]></description>
			<content:encoded><![CDATA[<p>&nbsp;</p>
<p>A dive into needs and wants and whys.</p>
<p>Also see:</p>
<ul>
<li><a href="http://dougma.com/archives/183">Introduction</a> &#8211; Lets make a bug tracker!</li>
<li><a href="http://dougma.com/archives/186">Background</a> &#8211; It&#8217;s not really a bug tracker!</li>
</ul>
<p>Time to get to some use cases, requirements, needs, wants, and  technology. As mentioned in the last post, I need something to replace  the features we currently have in Lotus Notes. While this is a project inspired by issues at work, and I hope it will be used there, the actual needs and requirements are coming from me. Not everyone in my group would agree with what I am calling requirements. Some of my requirements are there because of specific pain points I am having, but which do not really bother others. But hey, it&#8217;s my project, so they don&#8217;t get a say.</p>
<p>With that, it is time for a</p>
<blockquote><p><strong>Disclaimer:</strong> The postings on this site are my own and don’t necessarily represent  my employer’s positions, strategies or opinions. Opinions and ideas  expressed here are mine and mine alone.</p></blockquote>
<h2><strong>Objectives</strong></h2>
<p>The pr1mary objectives are:</p>
<ul>
<li>Increase visibility with ease of access (web interfaces or simple app)</li>
<li>Integrate with other existing systems (Perforce, Plone, Active Directory, release processes)</li>
<li>Preserve and or migrate sixteen years of history (old bugs, api discussions, etc.)</li>
<li>Support existing workflows (fast ramp-up for users)</li>
<li>Minimal impact on schedules (get it up and running without anyone noticing until it is done&#8230; hi boss&#8230;.)</li>
</ul>
<p>That&#8217;s about it really.</p>
<h2><strong>High Level Functionality</strong></h2>
<p>Quick summaries of the functionality we will dive into.<strong><br />
</strong></p>
<ul>
<li><strong>Issue/Bug Management</strong> &#8211; Our needs are very simple, but specific. Multi-Project, RSS, large files, and read/unread markers being chief among them. The biggest issue here is how we manage the forks. Each bug/issue will have a status in multiple forks. There are multiple &#8216;active&#8217; forks which need to have their status set. As new forks are made, the system needs to know about them. Archived forks are not set on the bug. A bug is not fully closed until it is closed on all forks. Very few issue management systems do this out of the box, though many support similar work flows.</li>
<li><strong>API/Code Review </strong>- Nicely color coded diff of changes to the API with the ability to make comments on the changes, and have multiple versions of the multi-file changes. This is key for the API review. General code review is secondary. This need not be fully integrated into Perforce via  triggers. That option would be nice, but for the API  review, it would be better to submit a changeset via some script  (*cough* <a href="http://codereview.appspot.com/2635043/">upload.py</a>. gee&#8230; what ever system could I be thinking of for this&#8230;.) Essentially we need a way to review patches which are not checked in anywhere. It would be nice to review changes which are pending or already submitted as well, but that is a want, not a need; we have that functionality in another trigger system already.</li>
<li><strong>Discussion Threads</strong> &#8211; Similar to a bulletin board, but with thread and read/unread requirements. The key features from Lotus Notes are the overview listing with read/unread status. The &#8216;topic&#8217; can be changed and is modified over time with a change history. Underneath in the main list view are threaded responses. Everything has read/unread markers. Think of a threaded e-mail archive, where the initial e-mail can be edited with a change history. BBS systems come close but miss the mark on the key requirements.</li>
<li><strong>Static Release HTML Help/Docs</strong> &#8211; <em>(sideline work)</em> Technically not something that is in Notes and already exists internally, but we need to integrate with it cleanly so mentioning it. Adding an app to smartly serve up static files are set locations would be simple enough to add and would replace some fancy apache rewrite rules.</li>
<li><strong>Interview Management</strong> &#8211; <em>(bonus work) </em>special discussion system for managing resume&#8217;s, code problem solutions, etc. Has special permission requirements, and voting (+1, +0, 0, -0, -1)</li>
<li><strong>Machine Management </strong>- <em>(bonus work)</em> very very simple app for tracking who has what hardware specs on their machines. Not asset management. Just useful for when we have budget to get new hardware, or when a bug appears to be hardware specific.</li>
</ul>
<p>Some of this could be moved into Plone, or just abandoned. Much of the project management occurs in Plone and integrates across projects. Different projects use different tools, but everyone integrates at the Plone level. One restriction on the Plone side is, because many groups are using it, there is a very high barrier to adding new functionality or special customization. Also we do not have the option to go with a cloud hosted solution like some groups, due to our access to medical records. Just the access to such records means we can not risk it.</p>
<h3><strong>Issue/Bug Management</strong></h3>
<p><strong>Requirements:</strong></p>
<ul>
<li>Active Directory Integration</li>
<li>Perforce Integration (very minimal)</li>
<li>P4M Fork integration (one bug, many forks)</li>
<li>P4M Version integration (is the format valid)</li>
<li>RSS Feed (cheap Plone Integration)</li>
<li>Large file support</li>
<li>Milti-Project</li>
<li>Read/Unread Marking</li>
<li>Filters (open, closed, owned by, etc.)</li>
<li>Searchable</li>
</ul>
<p><strong>Nice to have:</strong></p>
<ul>
<li>Strong Perforce Integration, Checkin&#8217;s and Changesets (i.e. see the code from the issues)</li>
<li>Strong P4M Fork/Version integration (does a fork/version exist, active forks, etc.)</li>
<li>Smart links (turn @34245 into a link to the changeset, etc)</li>
<li>Colorized source code diffs</li>
<li>Nice WYSIWYG Editing instead of some wiki markup</li>
<li>Custom reports, queries.</li>
</ul>
<h3><strong>API/Code Review</strong></h3>
<p>There is a different between how we handle the API review than other code reviews. We have a requirements, and proposals. This is can be a many-to-many relationship, but usually is just one-to-one, or occasionally one requirement to many implementation proposals. Sometimes a bug will be referenced as it may require an API change to fix. Often it is a client which will submit a requirements document, which then gets refined.</p>
<p><strong>Requirements:</strong></p>
<ul>
<li>Active Directory Integration</li>
<li>Multi-file changesets</li>
<li>Colorized changes</li>
<li>Full and partial change context</li>
<li>Multiple revisions of a changeset (i.e. one proposal, multiple changesets)</li>
<li>Requirements document support w/ linkage to proposal</li>
<li>Change history on requirements document</li>
<li>Clear history on the changes to the item being reviewed</li>
<li>Good comment system (duh!)</li>
<li>Integration with bug/issue system (may just be a hand generated link)</li>
<li>RSS</li>
</ul>
<p><strong>Nice to have:</strong></p>
<ul>
<li>P4M Integration (submit a changeset from P4M)</li>
<li>Perforce Integration (Code review of already submitted changes)</li>
<li>Nice WYSIWYG Editing instead of some wiki markup</li>
<li>Custom reports, queries.</li>
<li>Read/Unread markers (this might be moved to a requirement)</li>
<li>Searchable</li>
</ul>
<h3><strong>Discussion Threads</strong></h3>
<p>Discussion threads are interesting in that they closely resemble the functionality in a BBS system, but with some very important differences. The initial topic post is edited with a change history, and then there are threaded comments below. The threaded comments are their own posts. The best way to think about it is your standard mailman archive thread view, but where the author of the initial e-mail can go back and edit their post, and the responses can have different titles. Each entry has it&#8217;s own read/unread state. This could be achieved in Plone with a single page with change history and threaded comments except that the comments are at the bottom of the post, do not have read/unread state, and can not be listed as part of an overview tree/list. So they are really nothing like that <img src='http://dougma.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  Which is why it is hard to find something which supports this mode of discussion we are so used to in Notes.<strong><br />
</strong></p>
<p><strong>Requirements:</strong></p>
<ul>
<li>Active Directory Integration (why do I keep listing this?)</li>
<li>Multiple &#8216;boards&#8217; or whatever (multiple databases in Noted language&#8230; just call it Multi-Project&#8230;)</li>
<li>Topic listing with responses all in one list, with thread indentation (needs a picture to describe)</li>
<li>Change history on Topic level posts</li>
<li>RSS</li>
<li>Read/Unread Marking</li>
<li>File Attachments</li>
<li>Filters (by date, person, etc)</li>
</ul>
<p><strong>Nice to have:</strong></p>
<ul>
<li>Smart links (turn @34245 into a link to the changeset, etc)</li>
<li>Colorized source code diffs</li>
<li>Nice WYSIWYG Editing instead of some wiki markup</li>
<li>Custom queries</li>
<li>Searchable</li>
</ul>
<h3><strong>Interview Management</strong></h3>
<p>This is a fun one. It is just a very simple Notes discussion database (actually all of these except the bug management is just a simple notes discussion database), with some extensions for voting. When a resume is approved via triage, an entry is added with it attached. We vote on resume&#8217;s with a comment. We decide who gets a phone call from these votes, and then people are assigned to make the calls. There is a &#8216;response&#8217; made to the resume with notes from the phone interview. If the phone interview goes well we send out a programming test. We wash and test the response program. It gets posted unwashed as a response to the resume. The washed version with comments and notes is posted as a response to the unwashed version. Then bring the person in for the real full group interview. Then we try to hire that person. Whenever we are done with the resume, it is &#8216;archived&#8217; with these notes where only the manager can see it. Very simple and a quick app to write. <strong><br />
</strong></p>
<p><strong>Requirements:</strong></p>
<ul>
<li>Active Directory Integration (blah blah blah&#8230;)</li>
<li>File attachments</li>
<li>Ability to &#8216;archive&#8217; entries to be seen only by manager</li>
<li>Voting by group members (+1, +0, 0 -0, -1) with comment.</li>
<li>Status (voting, needs phone interview, waiting for program, etc&#8230;)</li>
<li>Assign person to phone interview/code review</li>
<li>Read/Unread Marking (see a theme here?)</li>
<li>Filters (those which I have not voted on yet, needs votes, needs votes by person, a few others)</li>
</ul>
<p><strong>Nice to have:</strong></p>
<ul>
<li>Multiple repositories for different groups (call it Multi-Project)</li>
<li>Colorized source code?</li>
<li>Nice WYSIWYG Editing instead of some wiki markup</li>
<li>Searchable</li>
</ul>
<h3><strong>Machine Management</strong></h3>
<p>It&#8217;s just a simple table with change history!!! Bah, create a Plone page and be done with it! Why is this still in Notes?</p>
<h2>The Big Picture</h2>
<p>Really everything currently in Lotus Notes, sans the bug tracker, is based on the basic Notes discussion database. It is best to look at all of these existing pieces of functionality and see if the lines drawn between them are real or artificial. What I mean by that is, much of the code commentary need not be done in discussion threads, but instead as part of a generic code review application. The distinction between API review and code review is important due to who gets input and when, but not much beyond that. The discussion threads are a catch all for many different types of design resolution that do not fit in the API discussion, or the bug tracker. Plone is most likely the best place for much of what is currently in the discussion database, with the code, and file format stuff moving to a code review app. The machine management database is nothing but a poorly implemented Plone page with a table. The real things which are needed are a strong issue tracker, a powerful and configurable code review tool, and a very simple voting app for resume&#8217;s <em>(another voting app? *sigh*)</em>.</p>
<p>So next up is evaluating existing technologies to figure out which fit best, while taking into account both the background and the big picture. Just in time for me to go back to work in a few hours too&#8230; So much for getting it done over vacation. Failure #1 took long enough.  <img src='http://dougma.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>&nbsp;</p>
<p><strong> </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://dougma.com/archives/197/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Blogging a project &#8211; Background</title>
		<link>http://dougma.com/archives/186</link>
		<comments>http://dougma.com/archives/186#comments</comments>
		<pubDate>Tue, 28 Dec 2010 16:35:09 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://dougma.com/?p=186</guid>
		<description><![CDATA[In the introduction I covered the concept of blogging a new project from beginning to end. The reason for the vagueness of &#8216;project&#8217; and &#8216;experiment&#8217; is because I have not even decided on a name for the project yet. That is part of the entire &#8216;from concept to end result&#8217; thing. If I had a [...]]]></description>
			<content:encoded><![CDATA[<p>In the <a href="http://dougma.com/archives/183">introduction</a> I covered the concept of blogging a new project from beginning to end. The reason for the vagueness of &#8216;project&#8217; and &#8216;experiment&#8217; is because I have not even decided on a name for the project yet. That is part of the entire &#8216;from concept to end result&#8217; thing. If I had a name already it would not really be from the beginning now would it.</p>
<p>Now it is time to give the concrete context in which this project is being done. This could be considered the pre-requirements. It is important to know how your project is going to be used and the environment it is going to be used in. If you want your project to succeed, this information is crucial. this is where you find out what is really needed over what is requested; especially when the person declaring what is wanted is yourself.</p>
<p>This is going to be a very long post, but I will be touching on almost every sentence later on as this all needs to be taken into consideration. It is crucial to looking at these big picture issues when designing some tool or application which will be used in that context. To properly understand the why behind decisions I later make, I would rather dump it all here and refer back later. I hope it will be interesting to some at least.</p>
<p>As the background for this project is going to delve into the processes and development procedures at my day job, it is time for a,</p>
<p><strong>Disclaimer:</strong></p>
<p>The postings on this site are my own and don’t necessarily represent my employer’s positions, strategies or opinions. Opinions and ideas expressed here are mine and mine alone.</p>
<p><strong>Who we are</strong></p>
<p>My title is &#8216;Principal Research Engineer, MREC&#8217;. &#8220;MREC&#8221; is the modular speech recognizer project. We are a development group within research, in charge of productizing the algorithms and features needed by our extensive research teams, and deliver a core speech recognizer. We are a small team of 8 developers. There are many internal clients with technology  built on top, and we are not the only engine in the company. Everyone on the team wears many different hats. We are all developers, researchers, operations engineers, QA engineers, release engineers, technologists, and much more. Everyone works on everything, down to the build, release, deployment, and source code management tools. No one is irreplaceable.</p>
<p><strong>How we Work</strong></p>
<p>There has been a wave of &#8216;agile&#8217; going through the company which I personally find both invigorating and frustrating. At issue here is that we are really talking about &#8216;<a href="http://agilebut.com/">agile-but</a>&#8216; at best. What is frustrating is that our team has been agile sense before agile existed. We release often, and not all releases are used by all our clients. Some are only really used by our team. Most often releases are only used by our researchers, and never see a product. A release is made either because we would release on a date (irregardless of what is in the release) or because a feature or set of features were just completed. It has always been that way.</p>
<p>We do mainline development with continual integration. That is we have one single branch which all the work is done. When a major release occurs we fork the project, and the newly created fork is a maintenance fork which only has bug fixes. Any bug fixes are first implemented in the mainline and then back-ported to the forks which we determine need it. We try to only keep one or two forks live; which means sometimes forcing clients to upgrade to a newer version even when they only think they care about a single bug affecting them.</p>
<p>Notice I have not mentioned branches or branch development. We don&#8217;t do that. Check in, and check in often. Update with the mainline multiple times a day as needed. All tests must pass before checking in. Every bug is really two bugs, the bug and the missing test for it, both must be fixed to consider the issue resolved. This may sound like it slows down development, but just the opposite is true, and I would never work any other way now. We do have the ability to branch and work in isolation, but we prefer to keep that type of thing very short, and to be honest I can&#8217;t think of a time it has been needed. The point is not that you need to integrate with other people changes, but that you need to get your changes in to the mainline immediately so that everyone else can integrate with you.</p>
<p>There are code reviews. We only implement what is needed to solve the underlying problem or request. We do not implement what is requested, but what the requester needs, and rarely go beyond that. We do not implement features that are not used. We love to remove code and unused features. Over the past sixteen years we have removed three times more code than the current size of the project. We have competitions to see who removed the most code. That is the only code metric we look at with any seriousness. Our client facing API is reviewed by the entire team and all clients, and is approved before checked in. The documentation for the API is in the API headers. We bikeshed on the API endlessly; or at least it sometimes feels that way.</p>
<p>None of this should look surprising, revolutionary, or out of the norm. All the major open source projects I know of work this way; more or less.</p>
<p><strong>Communication is the Problem</strong></p>
<p>As I said, there has been this wave of &#8216;agile&#8217; in the company. Recently someone said to me, &#8220;I saw this months sprint document for MREC. It is great to see that MREC is going agile too!&#8221;; and I bite my tongue. I want to say, &#8220;You keep using those words. I do not think they mean what you think they mean.&#8221; When a term like &#8216;sprint&#8217; is used is does not have the meaning I expect it to have. Let me be clear here: the problem is me, not everyone else. The problem is not inherit in the systems, but in the communication. My communication and the communication between our group and the rest of the company.</p>
<p>Our group has been operating somewhat like a black box. Different clients would file requests and our manager would work with the clients and put together a general schedule for getting features in, and re-prioritize as needed. This is a group effort to figure out what needs to get done and in roughly what order. Our clients do not talk much to each other for prioritizing our work, as we do the prioritizing. The end result is, while we have advertised what we are working on, and what goes into a release, these things have been very hard for our clients to track. They know when a feature they need is in a release, but then they have to accept all the other features for all the other clients, which they have not been tracking. One could argue that this is not our groups fault, as we do make all this information available in one form or another; but that is not really fair and as we will see &#8216;one way or another&#8217; hides a multitude of sins.</p>
<p><strong>Communication is the Solution</strong></p>
<p>Much of this is old news and we have changed much in what it appears we  are doing. In truth we have not changed the way we develop at all. We  instead are exposing what we are doing and allowing our clients more  control over our priorities. The end result is that while there is no  real change in our priorities, our clients understand them much better  and feel like they can predict things better using agreed upon terms like &#8216;sprint&#8217;. In truth it is just a  change in perception; instead of not reading internal web pages and  e-mails with text attachments, they are not attending meetings, not  not reading different internal web pages, and not reading different  e-mails with excel attachments. But these are meetings, pages, and e-mails they asked for and we designed together, verses things our group dictated; which makes all the difference in the world. Yes, I am being hyperbolic.</p>
<p>Even with this added communication, which is real and effective, there are still problems. They center around our tools and environment. We are in the late parts of migrating to new systems, but all this has done is highlight the flaws and weaknesses in the systems we have yet to migrate.</p>
<p><strong>The Environment</strong></p>
<p>The different teams have different development styles which best fit their needs. Same goes for the tools used. The source code management we use is an in-house wrapper on Perforce called P4M. This is open sourced as part of the <a href="http://sourceforge.net/projects/devtools/">DevTools</a> project; which is woefully out of date and we will be releasing a new version of soon. We have multiple projects hosted on a single server. Our team has four projects including MREC, and other teams have projects as well. P4M enforces a directory layout including forking and branching. We choose Perforce over SVN or a DVCS for many reasons. We need to deal with huge files (&gt;2Gig) and a basic install takes up &gt;32Gig. The MREC tree alone has 300GB of history. We need to be very careful of permissions. Our engine is used in many medical transcription applications, so we must be very careful. We try never to check in unwashed data which might contain any <a href="http://en.wikipedia.org/wiki/Personally_identifiable_information">PII</a>. In the off chance that some does get through we need strict centrally controlled permissions. At any time we need to know who has what on which machines. For patent and litigation reasons we need everything centrally managed with strict backup rules. Perforce really is the only affordable system out there which can handle this. The entire 16 year history of the project is maintained and with proper dates and attribution.</p>
<p>Our project planning is managed external to the source code management, and the bug tracking and issue management. This at first might sound odd, but with so many internal clients, the priorities are often changing. In our issue management system, we have 3 severity levels (low, med, high), and a fourth level called &#8216;request&#8217;, for all features and improvements. Requests are automatically lower than any bugs. This has caused some friction with client, especially research. The problem is, the person filing the request really has no clue what the real priority is for a request. That is unless they are a project manager and have consulted all our other client product managers, and want to take on the task of updating the priority over time. As such the real priorities are managed in another system which links back. Known bugs on the other hand are always more important to fix than a new feature or improvement. This is a royal pain to try to communicate, and still results in heated discussions; we will get to this in a later post.</p>
<p>Releases are made to network directories which are accessible from the same path on all our grids. There is a &#8216;current&#8217; symlink for the latest supported version, the version is always increasing, and the directories are named with the version number. Each checkin to a codebase get&#8217;s its own unique increasing version number (managed by P4M). The release includes built static html documentation. This documentation is served up by an apache instance, or can just be accessed directly. We are migrating away from a hodge-podge of doxygen, pydoc, and other inhouse hacks over to a standardized sphinxdoc system. this includes taking the help from our API headers and generating sphinxdoc which is cross references with our Python interface. The overarching project planning, and release information is in a plone instance which is used by all of research and development. This is all magically cross linked and searchable; or will be once the sphinx stuff is done, some of which will be tackled by this hear project. Our sphinx extensions will be released as part of <a href="http://sourceforge.net/projects/devtools/">DevTools</a>, and some of the release exposure stuff will be dealt with this new project.</p>
<p><strong>Lotus Notes</strong></p>
<p>There are some pain points which have been hitting me over and over again now that the other parts are so much better and fully integrated. Specifically the way we review changes to our API, discuss architecture and format changes, and issue tracking. All three are managed in a 16 year old Lotus Notes system which has not changed much in 10 years. Very few groups in the company still use notes, and most can not be bothered to install the client. Most have other tools for these things which have web interfaces, or similarly, are causing communication problems. Requiring a tool like Lotus Notes does not sound like it would be much of an issue at first, especially as Perforce is required for code access. But when you realize that this includes a new e-mail address which is not integrated with your &#8216;real&#8217; exchange address, and you need help to set it up to make sense, and that we have no IT staff which know or understand Notes, nor a support contract; heck for all the nice things we have in it, I don&#8217;t want it. There is 0 chance of getting a web interface as well. In short it is an isolated island which plays well with it&#8217;s self, and can be linked to externally with some contortions (on windows only, with specific OLE/ActiveX plugins), but does not play well with others; and playing well with others is the point.</p>
<p>I do want to give a shout out to <a href="http://www.linkedin.com/profile/view?id=2393944">Eric Ochieng</a> who set up the Notes infrastructure years ago and which has been working flawlessly sense with no real support or maintenance.</p>
<p><strong>What this project really is</strong></p>
<p>This project really is not a bug tracker. What it really is, is something to replace all the functionality we currently have in Lotus Notes, with other things which integrate into all the other things we do. Part of that is indeed issue tracking, but focusing on just that would be a failure from the outset. We could go with something like CearCase+ClearQuest+Replication+Send-All-Our-Revenue-To-IBM, and spend months trying to get it all working, but that is just silly. There are a number of open source and pay for solutions which almost fill our needs. It is that &#8216;almost&#8217; which is the problem. There is nothing out there which does the things we need, let alone cover the things we want. This project is to dive down into what the real needs are verses the wants. Find the solutions out there which come closest while integrating together and with what we already have. the pieces which are still missing, I will create from whole cloth of some type. Python is not a requirement, but I will be leaning towards it, as I am the only person on our team which knows Ruby or Java well enough to support those languages; we are allergic to .NET for reasons I will go into later.</p>
<p><strong>Up next:</strong> The Big Picture &#8211; a dive into needs and wants and whys.</p>
]]></content:encoded>
			<wfw:commentRss>http://dougma.com/archives/186/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A Grand Experiment</title>
		<link>http://dougma.com/archives/183</link>
		<comments>http://dougma.com/archives/183#comments</comments>
		<pubDate>Tue, 28 Dec 2010 08:51:42 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[work]]></category>

		<guid isPermaLink="false">http://dougma.com/?p=183</guid>
		<description><![CDATA[Ok, maybe &#8216;grand&#8217; is a bit much. For a long time now I have wanted to blog a project. That is blog the progress of working on a project from initial concept all the way through to the completed result. I wanted to do this for the old PyCon website, but the time pressures for [...]]]></description>
			<content:encoded><![CDATA[<p>Ok, maybe &#8216;grand&#8217; is a bit much. For a long time now I have wanted to blog a project. That is blog the progress of working on a project from initial concept all the way through to the completed result. I wanted to do this for the old PyCon website, but the time pressures for that work did not allow for it. I have yet to write one line of code, or create the code repository, or anything; though I have been doing continual research for the past eighteen months. To be honest I have no time for this new venture, and no way on earth do I have time to blog it. I will be spending twice the time blogging it as coding it. Thus this is doomed to failure. But that is part of the point.</p>
<p>The point is not to watch as some &#8216;master&#8217; weaves a new project and does things in the &#8216;best&#8217; way. It is not for me to impart some special wisdom. Oh, no&#8230; The point is for me to fall on my face. Hard. Repeatedly. In full view of everyone. When I see something done right I rarely learn anything from it. The most revolutionary concepts often go completely unnoticed because they just work. And no one remembers things that just work; until they don&#8217;t. So this &#8216;Grand Experiment&#8217;, I hope, will fail many times over.</p>
<p>Another reason why I have not done this before now, is that I have not really had a project which will work with this concept. Oh I have some small utility projects (django-app-plugins chief among them) which desperately need my attention, but I am not actively using them for anything. I have found that when you work on a project like that without an actual site using them, something driving the development, you end up having to envision how you would like to use it. Then when an actual use case comes up it does not quite fit what you thought you might use it for. So this is going to be driven by real use cases for a mission critical application which will be used by a small development team.</p>
<p><strong>How the blogging will work. </strong></p>
<p>There will be an introduction (i.e. this post), background post, big picture post, and then breakdown posts. the breakdown posts will cover the work as I do it. There will be no overarching organization beyond whatever I have had time to do and get checked in. The breakdowns will cover the work which was done in the interim commits, and or new use cases. Each use case will also include research done on options on implementation details. Things which do not work will be highlighted. Even things which do work will not work perfectly. The entire reason for this project in the first place is because nothing in existence works well enough.</p>
<p><strong>So why now? </strong></p>
<p>Good question self. Well there has been a pain point at work which keeps coming up. I have a mandate to specifically NOT work on this. I also am on vacation, and have not done any programming for me. But I have seven days, a wake, a funeral, games night, home owners association management company migration, PyCon issues, three parties, two kids, and a none to happy wife. Did I mention that I do not have time to do this?</p>
<p><strong>So why blog it at all? </strong></p>
<p>Another good question self. Because doing this in public will produce better code. At the last Django Meetup it was commented by someone unnamed that they produce much better code when it is open sourced than when it is done for work, or just to scratch that itch. Everyone present agreed that it was the case for themselves as well. And finally, by doing it this way, it is clear that while this project may end up being used at work, it is done by me, on my own machines, on my own time, and is completely tangent to technology my company cares about. It is related to another piece of software which my group at work is already open sourcing. Oh and the entire falling on my face thing&#8230; that is just too much fun to pass up.</p>
<p><strong>So what is it?</strong></p>
<p>It&#8217;s a bug tracker.</p>
<p>Wait WAIT!!! come back&#8230; don&#8217;t stop reading!</p>
<p>The background and overview posts will go into the research behind all the alternatives, and this project will be more about leveraging existing technology, extending existing projects, writing the few apps which are needed to fill in little bits, and more than anything else, gluing it all together. The emphasis will be on the use cases and work flows which our group uses and dealing with less than enjoyable requirements. I do not even know if this will be used by my group at work in the end, but that is not really the point after all. There is going to be <a href="http://sphinx.pocoo.org/">sphinx</a>, django, <a href="http://jqueryui.com/home">jquery UI</a>, <a href="http://code.google.com/p/rietveld/">rietveld</a>, <a href="http://www.perforce.com/">Perforce</a>, <a href="http://sourceforge.net/projects/devtools/">DevTools</a>, and much, much more. I was told specifically not to work on it; which has almost the same effect as telling me something is impossible.</p>
]]></content:encoded>
			<wfw:commentRss>http://dougma.com/archives/183/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Sarah&#8217;s first Camping</title>
		<link>http://dougma.com/archives/91</link>
		<comments>http://dougma.com/archives/91#comments</comments>
		<pubDate>Tue, 27 May 2008 04:52:43 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[camping]]></category>
		<category><![CDATA[django]]></category>
		<category><![CDATA[dragon]]></category>
		<category><![CDATA[pycon]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dougma.com/archives/97</guid>
		<description><![CDATA[Well no bears this time, and no pictures yet either. We had a fun time at our annual first camping of the year, and Sarah had a blast. The weather was perfect. Everything went great. Josh was a great big brother showing Sarah the ropes and introducing her to everyone. One of the highlights was [...]]]></description>
			<content:encoded><![CDATA[<p>Well no <a href="http://www.dougma.com/archives/37">bears</a> this time, and no pictures yet either. We had a fun time at our annual first camping of the year, and <a href="http://www.dougma.com/archives/57">Sarah</a> had a blast. The weather was perfect. Everything went great. Josh was a great big brother showing Sarah the ropes and introducing her to everyone. One of the highlights was playing tag with Josh, his friend Quit and my friends Matt and Deidra.</p>
<p>But for some reason I just did not relax. I didn&#8217;t get to spend as much time with friends or just vegging on the beach as I wanted, and I really have no one to blame but myself. Part of the problem is that I just could not shut my brain off. Last year I got a notebook, and it ended up being my &#8216;PyCon&#8217; notebook. This year, I just didn&#8217;t seem to have time, oddly enough. Code freeze at work was Friday and that, I am sure, didn&#8217;t help. The official fork will most likely be this upcoming Friday.</p>
<p>I have had no time to work on any of my python projects, and it is driving me crazy. There are so many fantastic things happening with <a href="http://code.google.com/p/django-survey/">django-survey</a>, and <a href="http://code.google.com/p/django-hotclub/">Pinax</a> is picking up steam, I need to put the old PyCon &#8217;08 stuff into archival mode, and start up &#8217;09. There is some very very interesting stuff going on with the DFW Python group that I want to help out on as well. There are so many fantastic things being worked on right now by incredible people, and I feel a down right claustrophobic not being able to do anything myself. I only been able to attend one Boston Python Meetup so far this year!</p>
<p>It looks like I will be in Montanna the first week of July this year (my anual pre-PyCon-Tech kickoff-kickoff). Barring any project &#8216;<a href="http://www.dougma.com/archives/41">issues</a>&#8216; I hope to get my act together then (with respect to PyCon software for 2009). If there are any pythonistas in the Missoula area, please send me an e-mail! For now I guess I should get back to unpacking the car.</p>
]]></content:encoded>
			<wfw:commentRss>http://dougma.com/archives/91/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Generosity of the Python Community</title>
		<link>http://dougma.com/archives/88</link>
		<comments>http://dougma.com/archives/88#comments</comments>
		<pubDate>Fri, 25 Apr 2008 04:38:58 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[pycon]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dougma.com/archives/90</guid>
		<description><![CDATA[Steve Holden is participating in the 5K Race for Hope. He is looking for people to sponsor his run. Lets show the other groups the generosity of the Python Community! (Sorry Team Hopkins, Steve got to me first ) If Steve is willing to go the full 5K distance, we should be able to support [...]]]></description>
			<content:encoded><![CDATA[<p>Steve Holden is participating in the <a href="http://www.braintumorsociety.org/site/TR/Events/08_Race_For_Hope/1371875700?pg=entry&amp;fr_id=1230" target="_blank">5K Race for Hope</a>. He is looking for people to <a href="http://www.braintumorsociety.org/site/TR/Events/08_Race_For_Hope?px=1497445&amp;pg=personal&amp;fr_id=1230" target="_blank">sponsor his run</a>. Lets show the other groups the generosity of the Python Community! (Sorry Team Hopkins, Steve got to me first <img src='http://dougma.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' />  )</p>
<p>If Steve is willing to go the full 5K distance, we should be able to support him with some cash. With the exception of this past year, it has been his fund raising efforts which have kept PyCon so cheap. Lets put some of those saved pennies towards a great cause!</p>
]]></content:encoded>
			<wfw:commentRss>http://dougma.com/archives/88/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Google App Engine: The good, the bad, and the ugly?</title>
		<link>http://dougma.com/archives/84</link>
		<comments>http://dougma.com/archives/84#comments</comments>
		<pubDate>Sun, 13 Apr 2008 07:15:09 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dougma.com/archives/81</guid>
		<description><![CDATA[I have been holding off on writing this post as I prefer to fully form an opinion. At the writing of this there are almost 600 blog posts about the new google hosted application. Most seem to me to be flailing around the actual core of what this new little beastie is. Some are comparing [...]]]></description>
			<content:encoded><![CDATA[<p>I have been holding off on writing this post as I prefer to fully form an opinion.</p>
<p>At the writing of this there are almost 600 <a href="http://www.technorati.com/search/%22google+app+engine%22?authority=a4&amp;language=en">blog posts</a> about the new google hosted application. Most seem to me to be flailing around the actual core of what this new little beastie is. Some are comparing it to Amazon offerings, some as a threat to commodity hosting, and some as the dawn of a new computing revolution. A few highly respected people see this as brining application development to the people the way that html/aol/myspace brought web development to the people. Many see this as a validation that python is an enterprise level platform. While I believe python is just that, I do not yet see this as a validation. The validation comes with Google App Engine&#8217;s success. Not that the language needs this added validation. As for the revolution, time will tell.</p>
<p>Google is toting this as making the web a platform. A platform for development, essentially replacing the desktop as where applications get developed and deployed. They do all the busy work of setting up the hardware, configuring systems for monitoring traffic, setting up the database, setting up the source control system, bug tracking, and all the rest, and let you focus on writing the application. Also you get the power of Googles massive data centers with literal warehouses of machines and disks and their custom database.  They bill this as being a platform for building your web based applications, a user base, business, and revenue stream. Of course one revenue stream will be ad-sense further promoting the Google advertising juggernaut. This is all fantastic, but there are limitations (as there must be). I put the limitations at the end.</p>
<p><strong>What is Google really up to? </strong></p>
<p>Google is not releasing the App Engine in a void. There are many other services that google has been rolling out over time (and many quite recently) which need to be looked at in order to get a proper view of what is happening. First lets step back a bit and look at Google&#8217;s past. In the past when google released a feature with an API, people would rush out and start building mashups. Mashups which combined parts of Google, and parts of other systems. Systems google often had little or no control over. Early on Google revoked some keys when things got out of hand. Very early on there was some bad PR. Some mashups went away, some came back, some just died out. There was a wealth of data, information, and potential revenue behind those mashups, but it was out of Googles hands by and large.</p>
<p>In the mean time Facebook came along and changed what it meant to write a web app. No longer were applications monolithic disconnected things. They were widgets which plugged into a page. They were cool integrated, socially networked, and shared. They were things people paid real money for. They were things people were using to generate adwords revenue! Google created their own apps. They created them for all the other social sites and the desktop. they did not care who was the hot new trend, as long as they had a share. But they have to play by other peoples rules and API&#8217;s.</p>
<p>So google has their search, mail, maps, and documents, online bookmarks, and calendar. They have an rss reader. Others are making mashups, and now many of those are occurring on FaceBook as 3rd party apps. Google releases some extensions for form filling on the docs, and integrating charts. They release a data API. They release OpenSocial as an attempt to standardize all these social networks and the core of what their apps provide; the social connections. They release custom site hosting (without announcing it except in a blog posting). They now have all these great applications and pieces of applications. They have a means of creating, editing, and hosting static html and data. What they lack is a framework to integrate everything. Something where they host the mashups. Something that they can do the deep data mining on. At least that was the case until App Engine came along.</p>
<p>Google claims in the very opening of their announcement that the App Engine is all about the developers. It is all about the people out there who develop neat and interesting things and the feedback loop that creates. The creative creativity of the masses. That has always been the key to Googles overall success. They provide the tools, and others create all those great mashups, sites, and apps. This is not about you creating your cool new app. This is about you creating your cool new Google mashup app utilizing all the other google API&#8217;s. They are not all there yet, but they will be. The crucial one, the user backend, is already there. All the other offerings do not require their python API. All the other offerings already have javascript and IFrame, and other means of integrating which were developed for integrating with your blog or MySpace or FaceBook. But make no mistake, they are coming to GAE.</p>
<p>In short this is about taking all those Google pieces parts and creating the &#8216;next big thing&#8217; and using the developers out on the internet to do it, as they are the ones who will do it anyway and now they can do it for Google.  Google gets their precious data, their add revenue, and at some point people get to pay for the privilege of developing apps for them (either via adds or real money for removing those quotas).</p>
<p>Now comes the really cool part. The SDK includes everything you need for running locally. They have the Google Gears framework for making your apps work both online and offline on your desktop. Integrate all that fully and there really is no difference between your online web based apps and your desktop apps. There is still a long row to hoe before it gets to that point, but the pieces are falling onto place.</p>
<p><strong>Why Python?</strong></p>
<p>There are a number of theories about the real reasons for choosing python. Most believe it&#8217;s because python is one of Googles 4 primary languages. I do not believe that exactly. If this could have been done in Java, they would have done that. PHP is the only other &#8216;language&#8217; that could approach what they want to do. As what they want is a platform for developing mashups with their existing technologies on a massively distributed scale by unknown random people, here is a short list of requirements:</p>
<ul>
<li>Easy to develop in (who would develop in prolog?)</li>
<li>Sandboxable (including no ability to crash the server or corrupt ram)</li>
<li>No spawning of processes/threads (or other things to bypass cpu/process management)</li>
<li>No connections in or out of the app except those expressly controlled via an API</li>
<li>Easy means of administration (for the developers)</li>
<li>A language for which the Google API&#8217;s are already available</li>
<li>Low overhead for deployment on the servers [initial startup cant be too slow, later requests must be extremely fast]</li>
</ul>
<p>I know of no other language which meets these requirements. PHP comes close, but would require a partial lobotomy (where python just has some modules removed or limited). Also PHP is not one of the languages that there are API clients for. I know that people are clamoring for other languages. All I can say is don&#8217;t hold your breath. I just do not see it happening any time soon.</p>
<p>[UPDATE: as a commenter points out, google is quite dedicated to python and has many core programmers on staff including the language creator. This is a great help for getting things done and adding validity to the project. Read the comments to hear my thoughts on Ruby.]</p>
<p><strong>Growing Pains</strong></p>
<p>App Engine is in its infancy. As with all their Beta projects there are problems. The main problem is how they are dealing with the problems. In short they are overwhelmed. People are asking for PHP, and their favorite python projects to be supported. They made the mistake of claiming that most python frameworks will run on it without putting up the proper CAUTION signs. It is possible to get Zope to run on it with some work. All that is missing is the hook to use the google database instead of the ZODB as the backend, a few minor tweaks, and use the WSGI adapter. Twisted is just out due to the signals, and the threads, and crucially, the tcp connections. One of the problems is that people expect that XYZ module should just work, and it&#8217;s the App Engine teams job to do that. The team seems to feel that they provide the framework and others should do the porting. There are also reports of bugs not being responded to in a timely manner. This is a bit laughable given the shear number of bugs currently reported and the 15 or so engineers they have dealing with all the App Engine deployment issues. I am sure that no one expected to have to deal with flamewars in the bug tracker. Or that thousands of people would post +1 comments in the bug tracker making it next to unusable (some people just can&#8217;t read instructions). I would not expect all the current bugs to be triaged until late next week or the week after.</p>
<p>Most of the complaints seem to be about the limitations put in place. I can understand that, but I can also understand why hell will freeze over before most of them are lifted. When it comes to an initial deployment it seems quite generous and unrestricted. Insanely so. If you think about what it takes to deploy something like this, at this level, things start to click into place. How would you do it? How would you manage the issues, security risks, vectors of abuse? It is great to say you want to create thread to accept a certified https connection, but if you are making that request, then you have no clue about the technical aspects behind that request or the technical aspects behind the App Engine.</p>
<p><strong>Current Limitations</strong></p>
<p>1. No long running processes</p>
<p>These are run once executions, and there is a time limit of a few seconds. Think of this the same way you would think about a PHP page.</p>
<p>2. No reliable state between runs</p>
<p>There is potential state from one run to the next, but you should not rely on it for large deployments. All state and persistent data should be stored in the database (or via some neat hacks). NOTE: this is more from my reading between the lines and knowledge of load balanced grid deployments. I.e. I do not trust their &#8216;<a href="http://code.google.com/appengine/docs/python/requestsandappcaching.html">cache</a>&#8216; system as something that can be relied upon.Why? Because we are talking about nodes and <a href="http://code.google.com/appengine/docs/python/sandbox.html">sandboxes</a>.</p>
<p>3. No incoming TCP connections</p>
<p>No binding to sockets, etc. These are Google&#8217;s servers. Even they do not know which node your http request which starts the app will be run on; no way of knowing which IP it really will be. Only apps are running on these nodes. This means no mixing of non-app and app requests. No twisted or zope admin instances. For google to provide a proper balanced network (with proper dispatching), it has to be that way (well at this phase in the game at least).</p>
<p>4. Limited connections out</p>
<p>Google has a url API for making http and https requests out to other servers, a connection to a database and a mail API. Those are the only outgoing network connections, and all are bound in API&#8217;s. If you were allowing anyone to run programs on your servers would you want them to be part of botnets?</p>
<p>5. No https</p>
<p>This is not static IP hosting, no cert for you! There are some things that can be done, but there would be cert warnings, etc. Granted this does not stop you from integrating with PayPal, or Google Checkout, where the https checkout is handled by a different site (insanely weak). [UPDATE: yes I know static IP's are not required for certs, but they are required if you would rather people not to get the cert warning or have IE7 mark the site as 'insecure'. And google will not pay for a cert per app, nor will it get a single cert for all apps some of which they are not really sure of their authenticity (a phising app based on Adrian's dynamic html-&gt;template tool for instance.) I do expect them to support something in the future, but that is a ways off and will not be for free.]</p>
<p>6.  No spawning new processes  (or signal overriding)</p>
<p>Well no big surprise here. Starting new processes could be very dangerous for all, and signal overriding&#8230; well that could make it hard for google to safely stop a rogue app (among other things).</p>
<p>7. No creating new threads!</p>
<p>Ok, this is a bit strange at first blush, but if you have ever dealt with grid deployments, or taught a 102 CS course (where you start covering semaphores and mutexes, and IPC) you have had the experience of a rogue multi-thread app taking down a  machine. Part of the problem is that creating a new thread is very much like starting a new process. One of the interesting things about starting a new process is that the operation does not adhere to the nice protocol. It gets the CPU to do that start no matter what, and at the kernel priority (which is not very nice). New threads behave the same way, and are a PITA to deal with when trying to take down a process which has gone rogue. I hated that lab. I have other theories behind why they do not allow this (but that is for another post)</p>
<p>8. No &#8216;real&#8217; filesystem</p>
<p>Well there is no real access to the file system. Not the &#8216;real&#8217; file system. As such certain things like tmpfile are not present (as there is no /tmp directory).</p>
<p>9. Crippled import/bytecode</p>
<p>Well that is an overstatement. Google has written their own import replacement, and modified the bytecode (I think) from standard python. I have some theories on why for a later post, but the deal is, forget about using marshal, imp, or even some of the package __import__ hooks, and cPickle is just pickle.  Part of the reason is because of the lack of a &#8216;real&#8217; file system. The python path and import control is special as only packages from google, and those in your current app are available, and they are specially managed. This should not affect anyone unless you play funky package import tricks that you should not be doing anyway. Extending __path__ in packages does still work, but using __import__ directly to import a package using a computed abs path does not work (might be a bug).</p>
<p>10. Quotas and App shutdown</p>
<p>If your app gets too popular and goes over quotas, then it is disabled. Once it gets too popular,  you need to buy more computes, etc. None of the quotas are set in stone yet, and of course if you use google analytics and/or ad-sense, then the quotas are less restrictive or removed. The details are still in flux. For the beta period you can request a larger quote for free (but each request is reviewed for merit). You can also report app abuse if you find that someone&#8217;s app is not being nice.</p>
<p>[UPDATE: Here is the link to the current <a href="http://code.google.com/appengine/articles/quotas.html">quota system</a>.]</p>
<p>11. Only 3 Apps and no deleting.</p>
<p>For the beta period, each developer can have just 3 apps, and you can not delete an app.</p>
<p>12. Only pure python</p>
<p>No c extension modules. This is again because of the sandbox system, and all the other stuff above. You can&#8217;t prevent process or thread spawning in a c extension. You can not stop a c extension from corrupting things in very bad ways. You can&#8217;t stop it from attempting to connect out or bind a tcp port. And it would be a PITA to distribute the binaries to all the nodes like they do for the apps themselves (via custom import hooks + caching).</p>
<p>[UPDATE: fixing numbering and adding some other restrictions and errors people have pointed out]</p>
<p>13.  1MB per file upload limit and 500MB total storage limit.</p>
<p>The 500 MB limit is part of the current Quota system, but I was unaware of the 1MB file upload limit that a commenter pointed out.</p>
<p>14. 1000 files in an app limit</p>
<p>This is a huge problem for people trying to deploy pylons, TG, or Django trunk based applications. One potential solution (which is not currently supported) would be for google to allow for python zip imports and have things bundled.</p>
<p>15. The Google DataStore has limitations over a classic RDBMS</p>
<p>Ben Bangert has a great <a href="http://groovie.org/articles/2008/04/13/google-datastore-and-the-shift-from-a-rdbms">write up on this</a>, so go read that. <img src='http://dougma.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://dougma.com/archives/84/feed</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>Files, Storage, Google, Python, and UnConference Software</title>
		<link>http://dougma.com/archives/80</link>
		<comments>http://dougma.com/archives/80#comments</comments>
		<pubDate>Wed, 09 Apr 2008 05:18:57 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[olpc]]></category>
		<category><![CDATA[pycon]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[rant]]></category>

		<guid isPermaLink="false">http://www.dougma.com/archives/79</guid>
		<description><![CDATA[Well, this was going to be three or four posts, but thanks to some interesting announcement from google, it all sort of runs together. It still will be I think. I will most likely try to rewrite things to give an overview and go into detail on specifics later. Things are getting interesting at work [...]]]></description>
			<content:encoded><![CDATA[<p>Well, this was going to be three or four posts, but thanks to some interesting announcement from google, it all sort of runs together.</p>
<p>It still will be I think. I will most likely try to rewrite things to give an overview and go into detail on specifics later. Things are getting <a href="http://boston.bizjournals.com/boston/stories/2008/04/07/daily21.html">interesting</a> at work so we will see how much time I have to pull that off.</p>
<p><strong>Files </strong></p>
<p><a href="http://radian.org/notebook/google-datastore">Ivan</a> beat me to the punch on the main gist post. While at PyCon I had the opportunity to chat with Mike Fletcher, another OLPC volunteer whom I forget their name, Phil Hassey, Richard Jones, Jeff Rush, and about 5 other people who wandered in and out of the small sprint room we were all half passed out in. People came and went durring the discussion (I believe Richard and Phil went off to play a board game at one point as well) which ranged from modern Sci-Fi offerings to games to global warming being a net win for Canada to the history of the world (not the movie). I should have gone to bed well before the discussion started. The discussion turned to the object store on the OLPC platform. Jeff, coming from a ZODB background, was quite pro object store systems replacing &#8216;file systems&#8217;. This is a hot button topic with me. This topic has come up at every professional job I have had going all the way back to when I was an CO-OP at Motorola as a &#8216;Document Administrator&#8217; (secretary). In fact the only two topics which are more hot button for me are &#8216;common application UI framework&#8217;s, and &#8216;<a href="http://www.news.com/8301-10784_3-9914240-7.html?tag=nefd.lede">security after the fact</a>&#8216;. I first started thinking about this subject back in 93 when I first started working on  MUDDs <em>(warcraft, only 100% text for you youngins)</em>. The world was editable online (like a lisp MUSH) but also had revision history (via RCS initially). We were dealing with &#8216;serialization&#8217; and how objects were managed. I fell in love with the idea that everything could be described as having a set of attributes (tags) and really you wanted to store and manage these things by those attributes. Permissions were nothing more than attributes. Actions were nothing more than attributes. Meta data by definition were just attributes. We struggled with systems for this, but I came away convinced that we needed a new paradigm in object storage, and this &#8216;file&#8217; stuff was running on borrowed time. It came up again at Motorola for document management. It came up again at OpenVision (later Veritas) for backup and security compliance. It came up again with <a href="http://en.wikipedia.org/wiki/Rational_ClearCase">ClearCase</a> and Derived Objects. It came up again with &#8216;dictionaries&#8217; and data management for VoiceXpress. And the code base I currently work on has something called &#8216;DFiles&#8217; which I can not discuss except to mention the name (DRAT!)</p>
<p><strong>Storage</strong></p>
<p>Back to the discussion at PyCon. I wish I had a transcript of the discussion <em>(no I don&#8217;t&#8230; I was not as coherent as I think I was)</em>. The Idea that everything is just blobs in a cloud of data where the tags determine the meta-structured is nice, but there are some problems. The first and most obvious problem is that it does not integrate well with existing technology and libraries. Decades of software has been written with the concept of files. You can try a fake mapping, and try to integrate things, but it does not work well. Then there is the concept of &#8216;sub-blobs&#8217;. That is each of the pieces of data could have sub parts. This maps well to your document which might have a chart or spreadsheet as part of it for instance. This can greatly simplify serialization, and you get all those nice blob store things. Your in-memory structure is your serialization structure. But in reality we already have this. They are called files and directories. It is simply <em>(*cough*)</em> an implementation detail dealing with the storage mapping. Ok, there is nothing simple about it, but we will come back to this. The argument then turned to the fact that you can&#8217;t have a blob show up in more than one directory. False. Those are called symlinks, but again that is an implementation detail. One of the biggest benefits of an object-store-as-filesystem is the ability to find and manage things not in a ridged tree structure which does not scale well in the average human brain (where did I put my <em>(ssh)</em> keys again?) But in practice it is just replacing one confusing arbitrary structure with another on some level as it&#8217;s usefulness is measured by the quality of the tags, attributes, and indelibility of the data.If you had those things well defined in a directory tree structure, then it works just as well (as google desktop search proves). A more subtile problem is that not all tags/attributes are created equal. It took a long time for my betters and practical experience to prove this to me. Many attributes are only useful to programs. These programatic tags are for relating data, validation, encoding, and the like. Most of the time these are auto generated or involve mathematical computations. They are never intended for human interpretation,but are none the less crucial for data management. You can try to predetermine the different types of these meta attributes, or just lump them together, but neither of these approaches are really tractable. Spend some time deep diving into the abuses of the windows registry and you begin to get an idea of the issues.</p>
<p>I know I am glossing over all the details, and not really giving any points the attention they deserve. I am not even properly quantifying the points. Issues of language are completly being skipped over (try describing what a &#8216;word&#8217; is in your application; try again when that application deals with speech and natural language&#8230; how does that abstract into meaningful tags?) Oh well. The point is there must be a happy median. We should be able to have something which has a file system programatic interface, as well as a generic data store interface. The browsing of the data should be an abstraction. If this is implemented with a classic journaling file system or in a database should be an implementation detail at the filesystem level. Why invent a new abstraction layer which everyone must now implement against when we have a perfectly good one that everyone already does? A file by any other name still contains data. If this is such a good model, think about extending it to namespaces. The problems in software code management (which is just data on a very real level) for which namespaces were invented exist on the filesystem as well. Chew on that while you code with Matrix.Optimizer and Optimizer.Matrix.</p>
<p><strong>Google</strong></p>
<p>Google has an interesting take on all of this. All of their service (news, documents, reader, calendar, mail, blogger, etc) all have a file like data storage for the objects represented. They use folders/directories (really tags). The only restriction is that the folders are only one level deep. I do not care for this myself. I would love to be able to have a &#8216;people&#8217; folder under my &#8216;python&#8217; folder and have only those times tagged with both &#8216;python&#8217; and &#8216;people&#8217; under that &#8216;folder&#8217;. Maybe that is just me. I would not want these sub folder relations to be automatic. I would want control over the layout, but have the population automatic. But that is the only extension to their system I would like to see. Beyond that it just works. It works with both the object store model and the file/directory model. If only google would open up their API&#8217;s a bit more to include this system. On wait, <a href="http://code.google.com/appengine/docs/datastore/">they just did</a>. You know if I had hit &#8216;publish&#8217; on this post last evening when I first wrote most of this, I would have been &#8216;prophetic&#8217; or at least &#8216;first post!&#8217;.</p>
<p>It&#8217;s not all hearts and ponies and sparkle (even if it is python and an abstraction layer on top of django to boot!!!) I have been holding off on posting this err&#8230; post until I could formulate a non-reactionary opinion on the entire Google Apps thing. I now have an opinion and it is much along the same lines as <a href="http://oubiwann.blogspot.com/2008/04/problem-with-and-solution-to-google-app.html">Duncan McGreggor</a>. The issues I have are both similar and yet unique to his, and I will post on them separately.</p>
<p><strong>Python, and Conference Software</strong></p>
<p>This post is already too long,and my laptop battery is dying (no the charger is at work <img src='http://dougma.com/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' />  ). Those of you that I talked with at Pycon about UnConference hosting know what this is all about, and I told you so <img src='http://dougma.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> . The last piece just fell into place. With that, good night <img src='http://dougma.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://dougma.com/archives/80/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>This Week in Django on PyCon2008</title>
		<link>http://dougma.com/archives/68</link>
		<comments>http://dougma.com/archives/68#comments</comments>
		<pubDate>Mon, 10 Mar 2008 19:22:33 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[pycon]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dougma.com/archives/65</guid>
		<description><![CDATA[And visa, versa. I had a wonderful talk with Michael Trier and Brian Rosner who do the &#8216;This Week in Django&#8216; podcast. Lots of useful information, check it out. (And you get to hear my voice&#8230;.) UPDATE: Here is the image I mentioned in the podcast about [name withheld] watching a presentation Guido gave at [...]]]></description>
			<content:encoded><![CDATA[<p>And visa, versa.</p>
<p>I had a wonderful talk with Michael Trier and Brian Rosner who do the &#8216;<a href="http://blog.michaeltrier.com/2008/3/10/this-week-in-django-14-2008-03-09">This Week in Django</a>&#8216; podcast. Lots of useful information, check it out. (And you get to hear my voice&#8230;.)</p>
<p>UPDATE: Here is the <a href="http://www.jafo.ca/oldphotoblog/images/200702/sw-20070224-04.jpg">image</a> I mentioned in the podcast about [name withheld]  watching a presentation Guido gave at google, on google video, at PyCon 2007, while Guido was giving a talk. Thanks again to <a href="http://www.jafo.ca/">Sean</a> of <a href="http://www.tummy.com/">Tummy.com</a> for doing the networking last year, and making so that people could do things like this and not have it effect the network!</p>
<p><img src="http://www.jafo.ca/oldphotoblog/images/200702/sw-20070224-04.jpg" /></p>
]]></content:encoded>
			<wfw:commentRss>http://dougma.com/archives/68/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PyCon08 Registration Blues</title>
		<link>http://dougma.com/archives/67</link>
		<comments>http://dougma.com/archives/67#comments</comments>
		<pubDate>Thu, 14 Feb 2008 07:25:32 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[pycon]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dougma.com/archives/64</guid>
		<description><![CDATA[First off, early-bird registration ends in less than a week. This year has a fantastic lineup, and over twice the tutorials of any previous year. Not to mention the sprints which are free to the public. We have a new registration system this year. It has its quirks, and there have been some issues. Most [...]]]></description>
			<content:encoded><![CDATA[<p>First off, <a href="http://us.pycon.org/2008/registration/" target="_blank">early-bird registration</a> ends in less than a week. This year has a fantastic <a href="http://us.pycon.org/2008/conference/schedule/" target="_blank">lineup</a>, and over twice the <a href="http://us.pycon.org/2008/tutorials/schedule/" target="_blank">tutorials</a> of any previous year. Not to mention the <a href="http://us.pycon.org/2008/sprints/projects/" target="_blank">sprints</a> which are free to the public.</p>
<p>We have a new registration system this year. It has its quirks, and there have been some issues. Most of these were expected and we had plans in place to deal with them. <a href="http://www.amk.ca/diary/" target="_blank">AMK</a> has taken on the role of the Registration Manager. With things in his more than capable hands everything is running quite smoothly. I can&#8217;t help but stick my nose back in to resolve some issues. To be honest I do not have the temperament (or time) to deal with talking to actual people (code is more my thing). But some people I already know, or requests were made and I knew the person was still online. AMK didn&#8217;t help things by documenting some standard responses to some of the <a href="https://pycon.coderanger.net/ticket/198" target="_blank">common problems</a>, which made it too easy for me to &#8216;help out&#8217; <img src='http://dougma.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> . As of this writing we have more registrations than we had last year at the close of the  early-bird period (375 in 07, and 390 right now).</p>
<p>So why do I have the Blues? I can&#8217;t decide which tutorials to take. Work is paying for me to go this year (and has even ponied up for a sponsorship!) That means that they have a say. This is additionally complicated by the fact that I still don&#8217;t have my <a href="http://laptop.org/laptop/" target="_blank">XO</a>, and it&#8217;s questionable that I will get it in time. What this means is I am waffling on whether  to take <em>&#8216;<a href="http://us.pycon.org/2008/tutorials/SwigBeazley/">SWIG Master Class</a> (David Beazley)&#8217; </em>or <em>&#8216;<a href="http://us.pycon.org/2008/tutorials/SugarFletcher/">Making Small Software for Small People, Sugar/OLPC Coding by Example</a> (Mike C. Fletcher)&#8217;</em>.  I really want to take the Sugar class. We use SWIG at work, and not the simple stuff either. We don&#8217;t do any of the C++ stuff, but have things like type safety on our constants with custom repr&#8217;s and the ability to do things like pass in an array of length 0 and say its of length 100; have to test those memory error conditions after all. Callback functions are supported, oh and it&#8217;s thread safe, with all calls releasing the GIL safely. SWIG gets really painful at that level.</p>
<p>In the afternoon it looks like <em>&#8216;<a href="http://us.pycon.org/2008/tutorials/SciComputing/">Tools for Scientific Computing in Python</a> (Travis Oliphant and Eric Jones)&#8217;</em> for me. There were hard choices here,  I want to learn WxPython, and the generator tricks look very interesting, but in the end SciPy is the best choice both for work and for fun. With my existing experience I feel safe in skipping the morning companion tutorial. We use the SciPy packages including numpy and matplotlib quite a bit at work but my experience with the later is only cursory. I hate to admit it, but many times I use win32all to push the data into excel and generate graphs that way. I have it on my list to write an <a href="http://en.wikipedia.org/wiki/Fast_Fourier_Transform" target="_blank">FFT</a> and a group theory approach to <a href="http://en.wikipedia.org/wiki/Pitch_detection_algorithm" target="_blank">pitch detection</a> using SciPy; yes that is &#8216;fun&#8217;.</p>
<p>The evening is the toughest. This is where PyCon-Tech related tutorials come to the fore. It is down to <em>&#8216;<a href="http://us.pycon.org/2008/tutorials/AgileWebTesing/">Practical Applications of Agile (Web) Testing Tools</a> (C. Titus Brown and Grig Gheorghiu)&#8217;</em> and <em>&#8216;<a href="http://us.pycon.org/2008/tutorials/DjangoLab/">Django Code Lab</a> (Jacob Kaplan-Moss, Adrian Holovaty and James Bennett)&#8217;</em>. I would love to go to the code lab and dive into parts of PyCon-Tech. The hard part would be selecting something small enough to discuss. PyCon-Tech has grown quite large and some of the parts are quite involved. Just giving an overview of the registration system can take 3 hours (from actual experience). There are parts of the proposal system which need optimizing (but again explaining the issues alone could take too long). The real problem I would love help with is the separation of the display (html templates) from the core of what I now think of as the PyCon-Tech framework. We have a new design, but once again the design is coupled with the implementation. Swapping out another design is easier than last year, but harder than it was just two months ago. Then there is the 10K lb. Gorilla in the corner. PyCon-Tech has 0 automated testing. Guess I will have to go for the testing tools, and hope to snare the Django folks some other time. To be honest I feel weird with the Idea of bringing PyCon-Tech stuff to a tutorial; as if I am stealing time from attendees for conference related stuff.</p>
<p>At some point I need to do a post on the registration system&#8230; The idea was to keep it simple. Just a minor update to the old cgi form with the data stored in a database instead of a text file. Three hour chat to give an overview&#8230;. For the masochistic, you can read the comments in the source code <a href="https://pycon.coderanger.net/browser/django/trunk/pycon/attendeereg/models.py" target="_blank">here</a> and <a href="https://pycon.coderanger.net/browser/django/trunk/pycon/attendeereg/views.py" target="_blank">here</a> (which are incomplete) and some other notes <a href="https://pycon.coderanger.net/wiki/PyCon08/RegistrationManagement" target="_blank">here</a> and <a href="https://pycon.coderanger.net/wiki/PyCon08/RegistrationManagement/ToDo" target="_blank">here</a> (a little out of date).</p>
]]></content:encoded>
			<wfw:commentRss>http://dougma.com/archives/67/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Spotlight PyCon-Tech: Google Charts!</title>
		<link>http://dougma.com/archives/59</link>
		<comments>http://dougma.com/archives/59#comments</comments>
		<pubDate>Tue, 11 Dec 2007 07:46:11 +0000</pubDate>
		<dc:creator>doug</dc:creator>
				<category><![CDATA[django]]></category>
		<category><![CDATA[pycon]]></category>
		<category><![CDATA[python]]></category>

		<guid isPermaLink="false">http://www.dougma.com/archives/56</guid>
		<description><![CDATA[After finally being done with the talk proposals on Sunday, I decided to take a break and read some of those python blog thingies to see what the clean people are up to. I was hoping to find a few topics of minutia I could loose myself in. What I found were a ton of [...]]]></description>
			<content:encoded><![CDATA[<p>After finally being done with the talk proposals on Sunday, I decided to take a break and read some of those python blog thingies to see what the clean people are up to. I was hoping to find a few topics of minutia I could loose myself in. What I found were a ton of posts on the new <a href="http://code.google.com/apis/chart/" target="_blank">Google Chart API</a>. Now I have been looking for a good chart solution for quite some time. I even discussed certain options over at <a href="http://gulopine.gamemusic.org/2007/11/data-visualization-in-django-dream.html" target="_blank">Marty Alchin&#8217;s</a> blog. I have looked at <a href="http://dojotoolkit.org/projects/dojox#dash-dojox-charts" target="_blank">DojoX Charts</a>, and <a href="http://teethgrinder.co.uk/open-flash-chart/" target="_blank">Open Flash Chart</a>, and many others. Dojo has the best charts, but the API is a PITA to figure out. the Doc is mostly auto generated from the code, and the samples are next to useless. I know I can do great things with it, but I recently spent 3 hours on it and got nowhere. Open Flash Chart got me up and running in no time, and has some nice python bindings, but its a flash based solution. So when I saw the <a href="http://code.google.com/apis/chart/" target="_blank">Google API</a>, I just dove into it&#8217;s doc to see what is up, and forgot all about the blogs. Lets see what happened next shall we?&#8230;</p>
<p><span id="more-59"></span></p>
<p>Within 5min I had broken out Wing IDE and was coding franticly. It was a shear joy to think about. The API is simple, and robust, and I knew exactly the chart I wanted to make. Not a simple dinky chart, but a full fledged stacked area line chart with color for multiple data plots and a ton of data for each plot. One of the additions to the proposal system is a complete change history on the proposals. This is implemented with the django admin log, so that even changes made in the admin are captured (though with less detail). You can perform one of three actions; add, change, and delete. We have four primary objects in the proposal system; proposals, reviews, comments, and attachments. New attachments are considered proposal edits for the graph, and you can not edit or remove comments or attachments. You can not delete anything. So this broke down into 5 data plots; new proposal, edit proposal, new review, edit review, comment. I wanted to show each of these changes on a per day basis, with the space accounting for the total number of edits having occurred to that point in time.  In short I dove into one of the harder graphs. In truth the real work was generating the data I wanted to plot. The Google Chart API is a generic plotting api, so you need to scale and convert your data to match the type of graph you want to plot. Not a big deal, and anyone who has done any real work with plotting packages can do this blindfolded, and the Google API is so simple it makes it fun.</p>
<p><img src="http://chart.apis.google.com/chart?chtt=PyCon+2008+Talk+Proposal+Change+History&amp;cht=lc&amp;chs=600x300&amp;chdl=Comment%7CEdit+Review%7CNew+Review%7CEdit+Proposal%7CNew+Proposal&amp;chco=EDBD3E,A0AEC1,495E88,EC799A,9F0251&amp;chm=b,EDBD3E,0,1,0%7Cb,A0AEC1,1,2,0%7Cb,495E88,2,3,0%7Cb,EC799A,3,4,0%7Cb,9F0251,4,5,0&amp;chxt=x,y&amp;chxl=0:%7C2007-10-16%7C2007-12-09%7C1:%7C0%7C325%7C650%7C975%7C1300&amp;chd=e:ADAcAoAoAoAvAvAvA7BRBRBRBVBbBeBhBhBkBkBuB6CBCBCHC5C9DTD1EbFKHFJsKILOQGSAUxWsZcZ1arcWdSe3jFnhtSzI0E3B3U3a3a5-84,ADAcAoAoAoAsAsAsA1BLBLBLBOBVBYBbBbBeBeBnBrBxBxB3CqCtDDDfD.EuGZItJGKMN-PpR9TcVmV2WcX3YgZydohUl8rQrzujuzu2u2xa0U,ADAcAoAoAoAsAsAsA1BLBLBLBOBVBYBbBbBeBeBnBrBxBxB3CqCtDDDfD.EuGZIqJDKIN4PjR3TPVXVmWJXbYBZPc.gbk3pvqCsvs.tCtCvZyT,ADAcAoAoAoAsAsAsA1BLBLBLBOBVBYBbBbBeBeBnBnBuBuB0CmCqDADWDvEVFqHbHxI3MAMsOROqPKPWPmP8QCQGQcQ4ReSaSmVNVTVXVXXrah,ADAJAJAJAJAJAJAJAMAPAPAPASAWAWAZAZAcAcAlAlAoAoAvAvAyBFBVBeB3C2D1ECExGyG.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" /></p>
<p>With that done, I decided to change my code a bit to add the ability to restrict the time frame, so I could see the post-deadline changes only:</p>
<p><img src="http://chart.apis.google.com/chart?chtt=PyCon+2008+Talk+Proposal+Change+History+Post+Submission+Deadline&amp;cht=lc&amp;chs=600x300&amp;chdl=Comment%7CEdit+Review%7CNew+Review%7CEdit+Proposal%7CNew+Proposal&amp;chco=EDBD3E,A0AEC1,495E88,EC799A,9F0251&amp;chm=b,EDBD3E,0,1,0%7Cb,A0AEC1,1,2,0%7Cb,495E88,2,3,0%7Cb,EC799A,3,4,0%7Cb,9F0251,4,5,0&amp;chxt=x,y&amp;chxl=0:%7C2007-11-20%7C2007-12-09%7C1:%7C0%7C325%7C650%7C975%7C1300&amp;chd=e:SAUxWsZcZ1arcWdSe3jFnhtSzI0E3B3U3a3a5-84,PpR9TcVmV2WcX3YgZydohUl8rQrzujuzu2u2xa0U,PjR3TPVXVmWJXbYBZPc.gbk3pvqCsvs.tCtCvZyT,MsOROqPKPWPmP8QCQGQcQ4ReSaSmVNVTVXVXXrah,G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.G.,AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" /></p>
<p>The <a href="https://pycon.coderanger.net/browser/django/trunk/pycon/propmgr/changelog.py?rev=400#L218" target="_blank">code to generate the graphs</a> is now part of the project, and I will integrate it somehow for next year. In total I spent about an hour, and it was quite an enjoyable hour. With that done, I showed off my work to a few people, and then went back to the blogs to see what people were saying about the API.</p>
<p>What I read shocked me. There were a number of people who really liked the API, but I could not find anyone who had actually bothered to fully read the documentation or use it! There were complaints about negative numbers and hidden or undocumented features, and other garbage that is not really worth discussing. There were a few people who &#8216;got it&#8217;, and in general people thought it was cool and interesting. There were a few rants that are not worth mentioning.</p>
<p>So I think I will tackle the biggest misconception I have seen thus far. The Google Charting API is NOT for plotting your raw data points! It does not deal with dates. It does not deal with scales. It does not deal with negative numbers, log scales, or fancy data; but it CAN plot them! Why? Because it is just a basic plot package, and it has to deal with the restrictions that are placed on URL&#8217;s, as that is the data transmission layer. This is what I mean. If you have say data which ranges from 0 to 10, and you send Google that data to plot. It will plot it all right, but it will only plot it in the bottom 10% of the graph. You need to scale the data up to one of <a href="http://code.google.com/apis/chart/#chart_data" target="_blank">three plot ranges</a>. The docs go out of their way to highlight the distinction between actual and plotted data. These ranges are determined due to the limitations of the URL. The first one is the &#8216;<a href="http://code.google.com/apis/chart/#simple" target="_blank">simple</a>&#8216; 0-60 scale, supported by the simple encoding. this is the best way to get a lot of data points sent to google. This is because it only requires a single character per data point. If you don&#8217;t mind the simple resolution of only 61 discernible points on the Y axis, this is for you. The second is the &#8216;<a href="http://code.google.com/apis/chart/#text" target="_blank">text</a>&#8216; encoding; a 0-1000 range using essentially the percentile notation with one decimal point. The docs say 0.0 -&gt; 100.0, but 1K by any other name is still 1K. As this requires a whopping 5 characters per data point average (have to count the separator), I see no reason to every use this, but I understand that it is good for people who want to be able to read and hand type the chart data. The last is the &#8216;<a href="http://code.google.com/apis/chart/#extended" target="_blank">extended</a>&#8216; base64 2digit character encode which allows for a range from 0-4095. Depending on the encoding you use, you must rescale your data to match. So if you have negative values in your data, you must shift and rescale your data, and <a href="http://code.google.com/apis/chart/#grid" target="_blank">draw a new X axis line</a> on the chart where your scaled 0 value is. Why not have the API support math conversions? Because there are so many, and because of the URL encoding. You have a limited number of characters in a URL, and you need to optimize for that.</p>
<p>Lets dive deeper into the &#8216;<a href="http://code.google.com/apis/chart/#extended" target="_blank">extended</a>&#8216; encoding. This is most likely the only encoding I will ever use. Unfortunately the javascript sample code it comes with is complete garbage, and at first blush looks broken, limiting people to only 3844 unique values. I fear it will turn people off to what is a very simple encoding. Lets walk through an evolution of encoders in python <em><strong>[NOTE: all code on this page is in the public domain]</strong></em>:</p>
<pre>
GC_EXTENDED_MAP = (
    'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    'abcdefghijklmnopqrstuvwxyz'
    '0123456789-.')
gc_extended = lambda num: GC_EXTENDED_MAP[num/64]+GC_EXTENDED_MAP[num%64]</pre>
<p>Here we have a very simple number to  2 digit google extended encoding. This assumes that the number has already been scaled to between 0 and 4095. Yup thats it folks. So now what we want is to have something deal with the scaling for us.</p>
<pre>
GC_EXTENDED_MAP = (
    'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    'abcdefghijklmnopqrstuvwxyz'
    '0123456789-.')

def gc_extended(num, max=4095):
    scaled = (num*4095)/max
    return CG_EXTENDED_MAP[scaled/64]+CG_EXTENDED_MAP[scaled%64]</pre>
<p>this is better, but there are still some issues. The first one I will tackle is the &#8217;rounding&#8217; issue. Now the sample javascript code the google provides uses round. I personally do not like javascript&#8217;s round implementation as I have found it to be very slow on IE 6. I have no clue why. I also rarely use round in python when dealing with integer numbers. This is mainly due to laziness, but more on that in a bit. Unless you have a raw data range of around 6K or greater, you do NOT need to use round. This is because you are scaling up, and the difference of 1 is most likely going to be well within your delta error, or below your percision anyway. But lets say you do want to deal with rounding issues, and we can deal with floating point data at the same time:</p>
<pre>
    scaled = int(((num*4095.0)/max)+0.5)</pre>
<p>Done. Cheap and sleazy round and no &#8216;import math&#8217; needed. this is also faster in javascript on IE 6; MUCH faster (no clue why, maybe their round is a wrapper for fround?) Ok, back to some real issues. Many times you will want to plot negative numbers. You could shift your data, and then scale. or you could manage that as part of the encoding. Managing it as part of the encoding is slower in some respect (as you are repeating math), but it does simplify the code a bit.</p>
<pre>
def gc_extended(num, min=0, max=4095):
    scaled = ((num-min)*4095)/(max-min)
    return CG_EXTENDED_MAP[scaled/64]+CG_EXTENDED_MAP[scaled%64]</pre>
<p>Now we are getting some where. But there is one last issue. We may not know the full range of values, but instead want to Ceil/Floor errant values. For example, when dealing with signal processing, I know the signal will normally be within a given range (+-10db), but I also know that sometimes plugging in or unplugging equipment can cause measurement spikes in the data at 100db. These values are &#8216;real&#8217; in that they happened, but would skew the graph. We want to peg those to the local min/max:</p>
<pre>
def gc_extended(num, min=0, max=4095, floor=0, ceil=4095):
    if num &lt; floor: num = floor
    if num &gt; ceil: num = ceil
    scaled = ((num-min)*4095)/(max-min)
    return CG_EXTENDED_MAP[scaled/64]+CG_EXTENDED_MAP[scaled%64]</pre>
<p>There, that simplifies things, and with floor and ceil as pre scaled values, the api is simpler (if a bit slower). One last thing to deal with. Many times you need to deal with missing data points. Say you are plotting two lines which data collected at different points on the X axis. You have three options. You can give google the data for both X and Y in pairs, or you can give the data for just Y, and fill in the missing points with the <a href="http://code.google.com/apis/chart/#line_charts" target="_blank">&#8216;missing&#8217; data marker</a> &#8216;__&#8217; (the simple and text encodings also support missing data), or a combination of both. This can drastically reduce the length of the URL encoding the data if the union set of missing unique x values is large and you go with supplying pairs, or small and you go with additional missing markers.</p>
<pre>
GC_EXTENDED_MAP = (
    'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    'abcdefghijklmnopqrstuvwxyz'
    '0123456789-.')

def gc_extended(num, min=0, max=4095, floor=0, ceil=4095, missing=None):
    if num == missing: return '__'
    if num &lt; min: num = floor
    if num &gt; max: num = ceil
    scaled = ((num-min)*4095)/(max-min)
    return CG_EXTENDED_MAP[scaled/64]+CG_EXTENDED_MAP[scaled%64]</pre>
<p>There; done. That is all the python code for dealing with encoding any range of numbers onto a scaled google chart data extended encoded number. You can have your &#8216;missing&#8217; data point be -1 and it will still work just fine. You can Ceil and Floor, and deal with all the rest. If you use the django curry utility, you can have even MORE fun! Lets take a look:</p>
<pre>
def encode_data(raw_data):
    """encode the data for plotting 5 standard deviations from the average
       computed with missing data treated as 0.0 for avg compute
       (common bell curve compute)
    """
    dev = standard_deviation(raw_data) * 5
    avg = sum(x for x in raw_data if x != -1)/len(raw_data)
    encoder = curry(gc_encode, min=avg-dev, max=avg+dev, missing=-1)
    return ''.join(encoder(num) for num in raw_data)</pre>
<p>I love python. Now lets bring this full circle back to that blog post Marty made on <a href="http://gulopine.gamemusic.org/2007/11/data-visualization-in-django-dream.html" target="_blank">data visualization in Django</a>. With not too much difficulty we could construct some standard data based graphs for the django <a href="http://www.djangoproject.com/documentation/databrowse/" target="_blank">DataBrowse contrib app</a>. These would be simple, but extensible graphs. There would be a django view for generating the Google Chart API url, and would then return a redirect to that URL! No charting packages or javascript to write. Just some simple python, which unlike other charting packages can be checked into the Django contrib, as it does not require any other packages. Another cool use is having dynamic up to date charts in your <a href="http://meyerweb.com/eric/tools/s5/" target="_blank">S5 presentation</a>! This is just to0 cool.</p>
]]></content:encoded>
			<wfw:commentRss>http://dougma.com/archives/59/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

