Yesterday, after a beautiful (and greedy) Christmas, I decided to start learning basic of new tech stuff. First choice was Python programming language because seems quite similar to Ruby, several friends are skilled and is widely used in Google and for Spark so is probably one of the best language to learn at the moment.

I’m quite fluent with Ruby, PHP and Javascript but I have no skills about Python. I choose CodeAcademy for a basic introduction and I’m at about 50% of the course. Here is what I learned by now.

Python is dynamic typed (you have not to declare it) and types are almost the same of Ruby and PHP:

my_int = 42
my_float = 108.0
my_string = "The answer to life, the universe and everything"

There is no parenthesis, no begin/end structure. Everything is related to indentation (I absolutely love it). Function are defined as follow:

def my_function(argument): # colon start a new indentation level
return argument * 2

Control structures are always the same. Conditionals:

if condition1:
return 1
# elif is like elsif in Ruby and elseif in PHP
elif condition2:
return 2
return 3


for item in my_list
print item

Instead Hash and Array the use Dictionaries and Lists but they are almost the same:

my_list = ["daniele", "luca", "michele"]
my_dictionary = {"marco": 1, "matteo": 7, "michele": 4}

I started a few hours ago and I’m just a newbie. I hope the rest of the course on CodeAcademy will teach me about more complex topics and, talking about real world usage, I probably need to learn how to handle version management and discover most powerful libaries. Anyway I can already say is pretty cool to program using Python 🙂

Last summer I had the pleasure to review a really interesting book about Spark written by Holden Karau for PacktPub. She is a really smart woman currently software development engineer at Google, active in Spark‘s developers community. In the past she worked for MicrosoftAmazon and Foursquare.

Spark is a framework for writing fast, distributed programs. It’s similar to Hadoop MapReduce but uses fast in-memory approach. Spark ecosystem incorporates an inbuilt tools for interactive query analysis (Shark), a large-scale graph processing and analysis framework (Bagel), and real-time analysis framework (Spark Streaming). I discovered them a few months ago exploring the extended Hadoop ecosystem.

The book covers topics about how to write distributed map reduce style programs. You can find everything you need: setting up your Spark cluster, use the interactive shell and write and deploy distributed jobs in Scala, Java and Python. Last chapters look at how to use Hive with Spark to use a SQL-like query syntax with Shark, and manipulating resilient distributed datasets (RDDs).

Have fun reading it! 😀

Fast data processing with Spark
by Holden Karau


The title is also listed into Research Areas & Publications section of Google Research portal: