Skip to content

How to upload data stolen from the Net – for dummies**!

** If you are a cool professional Ruby programmer you might find all of this just a bit greenish.

We all have our pet web apps we work on whenever our day job does not get in the way. I have mine too. Many of these apps integrate various services over the web – like albums on Flickr. Well, every now and then I get a craving to actually put some usable data in my apps. I go around looking for that data on the net, which is followed by a moral debate with myself when I find some. Actually, it is not as bad as it sounds.

There is a lot of data that you can pick up legally – like the nutrient data that the Australian government publishes in one of it’s websites.

Data comes in various shapes and sizes. Tabular data that can be converted into CSV, which is sometimes the simplest to load. We all have done it a hundred times in projects. But here is how I did it.

It’s a simple approach. The script is fast enough for me. There are 4000 rows in the file and it takes about 20 sec to upload the same. That works for me.

You can off course avoid the first line using

You can also use FasterCSV and then avoid the first line using

That brings us to our next topic

Other alternatives to upload CSV

Here are some of the other alternatives…

  • Ruby String#split (slow)
  • Ruby CSV (slow)
  • FasterCSV (slow)
  • ccsv (fast & recommended if you have control over CSV format)
  • CSVScan (fast & recommended if you have control over CSV format)
  • Excelsior (fast & recommended if you have control over CSV format)

This data can be found here. There are some benchmarks of how fast each of the parser are in that website.

I like the simplicity of

What about other non-tabular data?

Well there is much to be had over the web. There is the RSS feed that you can do a lot with if a website is publishing an RSS. If not, well, there is screen scraping! Ever heard of that beast?! This is how I parsed and uploaded some RSS data some time ago.

Some months hence … and after a few more Railscast under my belt, I would use FeedZirra.

Do have a look at the github page here. You can then set up a call to FeedEntry.update_from_feed(“feed url”) in your crons, or use Whenever.

One is probably wasting a bit of bandwidth by the way, if you are doing this, cause you are downloading the whole feed, when there are means for you to just get what has changed. You can do this using…

Riskometer

More often then not, people want to access the risk of undertaking an integration engagement. Small teams are called together to do a proof-of-concept.

Such an engagement is what we attempted at one of our clients. Rather than talking about risks in vague terms, we visually represented the movement on identified risk categories, on our story wall. The daily standups often included an assessment, with stakeholders, on what the attempted stories have done to minimise risk associated with each item. Luke has more about it here.

Riskmeter proof-of-concept risk assessment

Riskometer

Fun in the times of aggressive schedules and kill-joy servers – part II

This is a continuation of the previous post. I wanted to write about our build and CI. The team was using Hudson for continuous integration (CI). ThoughtWorks had suggested Concordion and Selenium RC. I am a fan of Behaiviour Driven Development (BDD). But I think a team requires certain amount of exposure to automated functional testing before they start wanting tools like Cucumber and Concordion. Unfortunately, the paradigm a lot of teams are used to involves throwing the code ball over the wall to a QA. Developers in such teams lacks the patience to work with things like Concodian; more so if it’s a small project. (this is not a judgement on the team at all, don’t get me wrong; it’s just a different development paradigm they are used to) They are looking for something simpler in its paradigm. The team in Manly was no different.

We used an approach that TW had used before in another project some time ago. So all of this might sound retro-ish, but who cares! Selenium core is a fun tool to use. We created a small DSL, in Ruby to write our test scripts – our .test files. Tests ended up looking like this…

selenium core test dsl

Test DSL

The tests were compiled into Selenese FIT fixtures and hot deployed into the server which had Selenium core running together with the actual web app. Anytime you changed or added a script, you ran an ant target to compile and hot deploy. The simplicity of the paradigm meant that everyone, even the IT manager (!), could understand and write one.

There is of course a down side to all of this – there always is. Once you start having hundreds of these scripts, they become difficult to maintain. But in my opinion the simplicity of the process of test creation means that a small team can get started with serious functional testing faster. It’s more important to get started then to embark on a more sophisticated solution from day one.

Well, that’s it folks! We of course took care of issues like uploading a file from a scripted test. I can talk you through the details of that another day. We have created a sample project using selenium core, ant and tomcat here…

selenium core example

We did other interesting things, like being able to pre-populate a PDF form with test data and getting a build dashboard working (thanks to David Yeung) instead of the old traffic lights. (Even though I find the ‘Production’ monitoring on the dashboard positively funny)

thoughtworks build dashboard

Build Dashboard

Fun in the times of aggressive schedules and kill-joy servers – part I

I did some work with a firm in Manly recently. I had a lot of fun there. Not the kind of fun you are think of when you think of Manly.

I wrote this one yesterday and it turned out be longer then expected. No one likes reading a long blog  post – we are a generation of impatient scooters. But instead to trying to make it concise and to-the-point I will break it down to parts!!

The start was typical for a new team trying out Agile, testing and continuous integration. The team was wonderfully talented. There was Parth, Bob (whose fingers move on the board faster than anyone I have seen in a long time), Nirmal, Wiskin and the rest. I had a bit more exposure to Agile then the rest of the gang though.

The team had set up iterations and story walls effectively, thanks to Andrew Tam. But there was lots to be done around Testing, Continuous Integration, the build itself – generally to make development more simple and pleasurable. I had to set up Weblogic and SQL Server in my machine and had to go through numerous steps to set the application up so that I could log in. The heavy weight environment was a kill-joy. I have very strong opinions about Weblogic, Websphere, QTP-Silk like software.  Thankfully, a few people in the team had similar opinions.

In the market that I have had experience with, I guess Tomcat, Jetty like lightweight servers do the job pretty fine. Teams are unburdened and move much faster. Like Bob used to say, we are hired with an understanding that we will be able to fix issues that come up in the applications that we build, and the more complex the deployment environment and tools become, the more the probability that we will not be able to do so effectively.

So all in all, what we managed to do is simplify things by using lighter servers, lighter functional testing and a simpler build process overall. Our life was uncluttered and we started having more fun in our day job!

We continued on with fun things – getting build dashboards to work, test coverage reports etc. – but that’s a story for another day.

Let me give it another try…

There is quite a power struggle going on between my laziness and the need to publish something in my blog. Its fun to write about ones experiences if one is as vain as I am.

But then reality dawns on me whenever I get down to business – the fact that I am lazy. One just can’t escape reality sometimes. Any attempt to escape it leads to demoralising defeats at the hands of an old enemy.  But one should never give up. So here I am, making another attempt to beat the beast once and for all!

Important visualisation tip – for morons like me who have not seen this before

Try Wordle out – tag cloud meets my kind of visual design. Here is one for your wordpress tags too – FatCloud. I was at an agile workshop at one of our clients with Jason Yip. We took down comments people made. The following it what we got (I have removed some of the more incriminating content)…

Wordle Example

Wordle Example

Insoshi Vs Community Engine Vs LovdbyLess

The idea is to lift pubeurn.org from an amateur low feature website to something akin to Imitate Life or something beyond. puberun.org is doing good right now – not great, good ;) .

It was built over Altered Beast initially. Altered beast was ok for sometime. But I had two issues with it. Firstly, Courtenay seems to have stopped working on it. Secondly, it was always a bit too complex for me – specially the views.

I looked at three Rails options – Insoshi, Community Engine and LovdByLess. This post is a summary of what I found. I guess there are a few main themes when it comes to choosing a framework. Fit-for-purpose would probably be the first concern. Community involved is very important. There are other concerns like how much testing has gone into the framework, profiles of the owners and committers etc. Here is what I found on all these themes.

Fit-for-purpose – Community Engine seemed to be the most feature rich. Community Engine is also is also a plugin and can be used over Rails Engine, which is a definite benefit from what I see. I have been burnt in the past trying to separate my code from the framework’s code and this is a major issue when you are trying to merge with the original repo you forked from. Community engine has some important features that I have been looking for. I quite like the blog like discussions on various topics (there are separate forums as well which look like they have been ported from Altered Beast, which is not a very good thing as I see it). There are many a small feature that are quite likeable. I quite like the ‘metro areas’ feature, the fact that profile image is cropped etc. All in all it’s pretty feature rich.

On the other hand, I could very well be stuck with a bloat-ware which is very difficult to modify. If I think clearly (which is not something I do often!), I actually do not need many of the features. Insoshi on the other hand has simple code that I can manage very well. There are quite a few obvious features missing – it could have done with Captcha support for sign-ins, it could also have had the concept of forums (there seems to be just one forum and you can create topics in it), it could have had groups (Even though Yuki’s fork seem to have this feature, have to have a look). The fact that the code is simple entices me to it.

Community involvement – Insoshi wins hands down. It is the fourth most forked project in github, which means to say a lot of innovation is happening around it. Is it because it’s a Y Combinator funded open source?

Profile of the owner – Insoshi again seems to lead the pack – mostly because of Michael Hartl. He is the creator of Railsspace and wrote RailsSpace: Building a Social Networking Website with Ruby on Rails“. Insoshi is a two man company with Long Nguyen accompanying him.

Test – Don’t really know much about tests, have to have a closer look at the code. Will post a follow up soon.

All in all I am still very confused, even though I lean towards Insoshi.

I managed to learn some from ‘Managing to Learn’

Read Managing to Learn by John Shook yesterday. I would have to say it’s a very impressive book. I have spent some time reading and attending group meetings on problem solving in a lean manufacturing environment. Reading this book, I think, is one of the better ways of initiating yourself in the A3 problem solving technique. (Well, a book – any book – can never really substitute a good mentor, but can of course whet your appetite on A3.)

More than the actual subject matter of the book, what is impressive is the simple and innovative way the book is written. The book is a dialogue between an employee and his boss on how to solve a particular pestering problem in a plant. I will not spoil the fun by talking too much about it. It’s just 138 pages big; go read the book ;)

Thanks Jason Yip for pointing me to it. Yip and me might want to try out some of the techniques talked about. Join us if you are interested.

Full Context – why is it so important?

Puberun.org.au has grown in the mean time. Have a look at it now. It’s got a blog, photo and video sharing etc. Sign up if you want to. You don’t need to be from north east India to do so!

My work with puberun has given me some valuable insight into application usages and usability issues. We build complex application which can do so much. I have been building them for varied audiences – for operators and technicians in China and Singapore to heavy equipment dealers in the US to investment bankers in Australia. But what use is technology if it’s too complex to use. What use is technology if people do not use it?

The more I think about it the more I am convinced that usability and the human computer interaction is something that I should put my efforts in. Of course there are various reasons why people do not use a technology – some technical, some political and some just sheer aversion to change. Here are 2 probable reasons why people may not use a technical product.

1. Usability – obviously! Technology starts out simple and comes to an inflexion point where there is a demand for more features. This is where things start to go wrong. It often leads to the technology paradox that Donald Norman talks about in his book ‘The design of everyday things’. Technology at that point, rather than simplifying life, seems to make it worse.

2. A lack of full context of what is happening – I have been reading a lot about this lately. I read about the idea in this pioneering paper by Indrani Medhi at Microsoft Research. The paper was an ah-ha! Moment for me. It IS really important to understand the full context of how the technology works. A lot of people don’t use a technology not because they don’t want to or need to, not because they don’t understand how to use it, but because they don’t trust it. They also don’t trust it to bring about change in their lives!