Talend to Clojure

I’ve had the displeasure of being introduced to the Talend data tool a few months ago. I hate it. Its source-code should be printed out, then burned, and the ashes should be flushed into a cess-pit of nuclear waste. The Talend people peg it as an easy to use tool to get data massaged into various forms. The problem is that they seem to make a fair bit of money on teaching people how to use their tool. Since their food consumption and rent-payment depends on training programs, there isn’t much incentive to create a tool thats easy to use, or to encourage actually helpful documentation.

Do you recall the 90s? Back when documentation and how-to guides abounded, and they all went with the format of “Click this menu button. Open this dialog box. Select these options. Click OK.”? Thats the form of help one gets when using Talend. Its a great tool, I’m sure, if you enjoy being a mouse-wielding “power user”.

This is yet another version of the Gherkin language. The idea is to get the non-technical people with the domain knowledge to write code-that-isn’t-code. The problem is that that doesn’t play out so well in the real world. In the real world, this just winds up with actual programmers writing the tests. They did this with COBOL back in the day. They said “With COBOL, you don’t need to hire those pesky expensive programmers. You can just get your existing staff of non-technical business users write their own shit.”. How did that play out? Oh right, we now have legacy systems running in banks that noone understands or wants to understand. There are no “new” COBOL programmers. What programmer worth their salt would want to invest any time in learning a dead-but-zombie language? The result of this effort to have a non-technical user-friendly language is that there are now approximately 50 COBOL programmers out there, all within 10 years of retirement age, getting paid a bajillion dollars due to the supply and demand of COBOL programmers. Great job, everyone.

And there I was, a few months ago, dealing with Talend. It took me 3 days to figure out how to read in a CSV, then fiddle around with the results, then print out another CSV. In the end, I only figured it out in 3 days with the help of 5 other people hovering over my shoulder. People do what they are given incentive to do. Talend makes money on training programs, and so they have incentive to not make their software easy to use. XKCD has a tech-support-flowchart that is pretty accurate. This does not work with Talend. Congratulations, Talend, you are officially xkcd-tech-support-flowchart-proof.

The bright side

So this post is looking pretty dark. But, I am a silver-lining kind of guy. Back when I was working an unskilled labour job and I had a terrible manager, it wasn’t so bad. It was great because I was very highly motivated to find another job and preferably to join the ranks of skilled workers and contribute to society to a greater degree than as an organic automaton. See? I’m a silver-lining kind of guy.

The silver-lining here is that I got motivated to do some data-transformations using an actual programming language. And since its been a while since I did clojure, I’ve hopped back on that train to do data-transformations. Using my public library account, I found this Talend book. So I’m going to go over their exercises in the next little while and do all of them in clojure. This will give me a chance to re-learn clojure, and a chance to showcase how easy it is to do these data-transformations in a real programming language. With any luck, this might even help people move away from Talend.

Without further ado, the code below is for the exercise in Chapter 3: Transforming XML to CSV.

(ns talend-for-data-integration.chthree-xml-to-csv
  (:require [clojure.xml :as xml]
            [clojure.java.io :as io]
            [clojure.zip :as zip]
            [clojure.data.zip :as datazip]
            [clojure.data.zip.xml :as zip-xml]
            [clojure.pprint :as pp]
            [clojure.data.csv :as cd-csv]
            [semantic-csv.core :as sc]))

(def f
  (->
   "/home/ravi/myprogs/GETTINGSTARTEDTOS_DEMO_DATA/SampleDataFiles/Chapter3/catalogue.xml"
   io/input-stream
   xml/parse
   zip/xml-zip))

(def in-hash
  (for [sku (zip-xml/xml-> f :catalogue :sku)]
    (into {} (for [params (datazip/children sku)]
               [(:tag (first params))
                (zip-xml/text params)]))))

(with-open [out-file (io/writer "outfile.csv")]
  (->> in-hash
       sc/vectorize
       (cd-csv/write-csv out-file)))

It could do with some refactoring, but this is the first real clojure I’ve done in a year, and it gets the job done. Its the first time I’ve worked with clojure’s zippers, and they are somewhat similar to Ruby’s Nokogiri in how they traverse tree data-structures. I’m looking forward to working with them as I go through more complicated examples. I’ve also put up a new project on Github to track all this work.

Back to Emacs

A few months ago I came across this post, and since then I’ve backed off from using RubyMine. Here was a Ruby core developer using logging for debugging, while I was using a full-fledged IDE. It was humbling. In the intervening months I’ve switched back to using Emacs and I’ve started building a bunch of methods to quickly iterate when debugging either with the debugger or with the logging system. I’m not as good at it as I would like to be, but I’m happy with my progress. And its good to be back to using Emacs and regular unix tools again. It feels good to have all that expressiveness at my fingertips.

I can see myself using an IDE in the future, though, if I were to program in a statically typed language. I was doing a bit of meta-programming in Ruby, and found that RubyMine provided nearly no advantages for that over using Emacs. Its too bad; I really did enjoy RubyMine the few times I used it for mass refactorings.

Leave a Reply