Harðkjarni: May 2013

13 May 2013

Base64 encoding image files using Python

In an earlier entry I mentioned that I was building a little app to display a bunch of static information. Sometimes I find it immensely useful to embed all resources into a single data-file. Not only does it make handling the data simpler but also retrieving it over the internet.

My data has a bunch of text and a few images as well to go with it. I wrote the code below (in python) to automate embedding of image data directly in with the other text that I will be serving.

@staticmethod
def encode_image_as_base64( image_path, base_path ):
    """ 
        Loads an image from the supplied path (using the base path as a

        reference) and returns it as a base64 encoded string
    """
    
    # The image path can be a URI with a query string.

    # Remove any query string elements, basically everything following

    # a question (?) mark
    qs_split = image_path.split("?")
    image_path = qs_split[0]
    
    file_name = os.path.join( base_path, image_path)
    file_name = urllib.unquote(file_name)
    
    print "Encoding image: "+file_name
    
    encoded_string = ""
    with open(file_name, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())

    return encoded_string

It may not compress all that well but it makes data handling clean and simple.

11 May 2013

Calculating File Change List using SHA256

Realised today that I needed a good way to create a change list document for a small app that I am writing. The app has a few hundred screens, each of which is supported by a single data document that contains the information that the screen is showing.

This data is downloaded in individual JSON files from the web the first time the user launches the application. Each of these JSON files can be amended after the app is released. I wanted to be able to provide a quick and easy way for the app to download changelist information to detect if anything needed updating.

I wanted the data to be separated in such a way that if only one file is changed the user only needs to download that one piece of data but not the rest. This is crucial as the entire dataset is just shy of 100MB whereas each individual file averages around 400K.

Python to the rescue

Wrote a little script that creates a single changelist.json file for each of my data files using the built in SHA256 key generation library that ships with python. So simple and quick:

out_file_path = r"C:\outfiles"
jdata = {}
for path, subdirs, files in os.walk(r"C:\myfiles"):
    for name in files:
        if( not name.endswith(".json") or path == out_file_path ):
            continue
            
        json_file_path = os.path.join(path, name)
        
        if( json_file_path == out_file_path ):
            continue
        
        jdata[name.replace(".json", "")] = hashlib.sha256(open(json_file_path, 'rb').read()).hexdigest()
        
out_file_name = os.path.join(out_file_path, "changelist.json")
print "Writing "+str(len(jdata))+" change list items to output file '"+str(out_file_name)+"'"

# Now write the file to the out location and quit
with open(out_file_name, 'w') as outfile:
    json.dump(jdata, outfile, indent=4, separators=(',', ': '))

Python, gotta love it :)

8 May 2013

Failover resiliency is indeed important

When building software that is somehow dependant on data provided by someone or something that is out of your direct control I find myself wondering how reliable that service actually is and what the impact is when it eventually goes offline (because they all will at some point...)

I experienced this today with my super simple cycle hire page I built a few weeks for my mobile. The page quickly gives me the status of cycle stations close to my home and work which helps me decide where to go to get home/to work. An example of the page is below:

My simple page relied on the rather excellent free service at http://api.bike-stats.co.uk/ offered by Sachin Handiekar to provide it with the latest docking station information.

But today his service was down and my handy little website showed me nothing
:(

Adding failover providers

After gambling on today's docking stations numbers I decided to hunt for alternative data providers that I could use in case bike-stats were unavailable again in the future. One such alternative service is another excellent one provided at http://borisapi.heroku.com/ by Adrian Short.

A simple way to provide this failover would be to simply call my preferred one first and if that failed to give me valid data I would use the alternative as a backup. Easy enough to hack in php as both sites offered a JSON feed:

function URLopen($url) 
{ 
	// Fake the browser type
	ini_set('user_agent','MSIE 4\.0b2;');
	$dh = fopen("$url",'rb'); 
	$result = stream_get_contents($dh);
	fclose($dh);
	return $result; 
} 

global $json, $isJsonValid, $jsonSource;

// Indicates if we successfully parsed the json data
$isJsonValid = false;

// First try the bike-stats stuff (this might be slow or offline)
$rawjson = URLopen("http://api.bike-stats.co.uk/service/rest/bikestats?format=json");
$json = json_decode($rawjson, true);
if( $json != null )
{		
	$isJsonValid = true;
	$jsonSource = "api.bike-stats.co.uk";
}

// As a backup use the heroku json site	
if( !$isJsonValid )
{
	$rawjson = URLopen("http://borisapi.heroku.com/stations.json");
	$json = json_decode($rawjson, true);
		
	if( $json != null )
	{			
		$isJsonValid = true;
		$jsonSource = "borisapi.heroku.com";
	}
}

Now with my super basic failover my problem became that these two providers, although scraping the same data, present the JSON data in slightly different formats.

Going back to my original page I decided to normalise the json data that was used and simply have my failover code perform a quick transformation of the data to my new normalised format. So my final improved data sourcing with failover resiliency and normalisation (wow quite big words there...) ended up looking like so (sorry for the indentation):

try {

$rawjson = URLopen("http://api.bike-stats.co.uk/service/rest/bikestats?format=json");
$jsont = json_decode($rawjson, true);
if( $jsont != null ){		
	$json = array();
	foreach( $jsont["dockStation"] as $station){
		$json[] = array(
		"id"=> $station["@ID"],
		"name" => $station["name"], 
		"bikes" => $station["bikesAvailable"], 
		"docks" => $station["emptySlots"],
		"lat" =>  $station["latitude"],
		"lon" =>  $station["longitude"]
		);
	}
	$isJsonValid = true;
	$jsonSource = "api.bike-stats.co.uk";
}
}catch (Exception $e){
	$isJsonValid = false;
	$json = null;
}

// As a backup use the heroku json site	
if( !$isJsonValid ){
try{
$rawjson = URLopen("http://borisapi.heroku.com/stations.json");
$jsont = json_decode($rawjson, true);
if( $jsont != null ){
	$json = array();
	foreach( $jsont as $station){
		$json[] = array(
		"id"=> $station["id"],
		"name" => $station["name"], 
		"bikes" => $station["nb_bikes"], 
		"docks" => $station["nb_empty_docks"],
		"lat" =>  $station["lat"],
		"lon" =>  $station["long"]
		);
	}
	$isJsonValid = true;
	$jsonSource = "borisapi.heroku.com";
}
}catch( Exception $e ){
	$isJsonValid = false;
	$json = null;
}}

It is ugly sure, but does a marvelous job getting me the information I need :)

Want your own map?

A little while ago I designed a simple customization page that anyone can use to create their own light-weight bike stats web page. These pages are best viewed on a mobile device (such as a phone).

Get your own right here :)

Happy cycling...