13 May 2013

Base64 encoding image files using Python

In an earlier entry I mentioned that I was building a little app to display a bunch of static information. Sometimes I find it immensely useful to embed all resources into a single data-file. Not only does it make handling the data simpler but also retrieving it over the internet.

My data has a bunch of text and a few images as well to go with it. I wrote the code below (in python) to automate embedding of image data directly in with the other text that I will be serving.

@staticmethod
def encode_image_as_base64( image_path, base_path ):
    """ 
        Loads an image from the supplied path (using the base path as a 
        reference) and returns it as a base64 encoded string
    """
    
    # The image path can be a URI with a query string. 
    # Remove any query string elements, basically everything following 
    # a question (?) mark
    qs_split = image_path.split("?")
    image_path = qs_split[0]
    
    file_name = os.path.join( base_path, image_path)
    file_name = urllib.unquote(file_name)
    
    print "Encoding image: "+file_name
    
    encoded_string = ""
    with open(file_name, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read())

    return encoded_string


It may not compress all that well but it makes data handling clean and simple.

11 May 2013

Calculating File Change List using SHA256

Realised today that I needed a good way to create a change list document for a small app that I am writing. The app has a few hundred screens, each of which is supported by a single data document that contains the information that the screen is showing.

This data is downloaded in individual JSON files from the web the first time the user launches the application. Each of these JSON files can be amended after the app is released. I wanted to be able to provide a quick and easy way for the app to download changelist information to detect if anything needed updating.

I wanted the data to be separated in such a way that if only one file is changed the user only needs to download that one piece of data but not the rest. This is crucial as the entire dataset is just shy of 100MB whereas each individual file averages around 400K.

Python to the rescue

Wrote a little script that creates a single changelist.json file for each of my data files using the built in SHA256 key generation library that ships with python. So simple and quick:

out_file_path = r"C:\outfiles"
jdata = {}
for path, subdirs, files in os.walk(r"C:\myfiles"):
    for name in files:
        if( not name.endswith(".json") or path == out_file_path ):
            continue
            
        json_file_path = os.path.join(path, name)
        
        if( json_file_path == out_file_path ):
            continue
        
        jdata[name.replace(".json", "")] = hashlib.sha256(open(json_file_path, 'rb').read()).hexdigest()
        
out_file_name = os.path.join(out_file_path, "changelist.json")
print "Writing "+str(len(jdata))+" change list items to output file '"+str(out_file_name)+"'"

# Now write the file to the out location and quit
with open(out_file_name, 'w') as outfile:
    json.dump(jdata, outfile, indent=4, separators=(',', ': '))

Python, gotta love it :)

8 May 2013

Failover resiliency is indeed important

When building software that is somehow dependant on data provided by someone or something that is out of your direct control I find myself wondering how reliable that service actually is and what the impact is when it eventually goes offline (because they all will at some point...)

I experienced this today with my super simple cycle hire page I built a few weeks for my mobile. The page quickly gives me the status of cycle stations close to my home and work which helps me decide where to go to get home/to work. An example of the page is below:


My simple page relied on the rather excellent free service at http://api.bike-stats.co.uk/ offered by Sachin Handiekar to provide it with the latest docking station information.

But today his service was down and my handy little website showed me nothing
:(

Adding failover providers

After gambling on today's docking stations numbers I decided to hunt for alternative data providers that I could use in case bike-stats were unavailable again in the future. One such alternative service is another excellent one provided at http://borisapi.heroku.com/ by Adrian Short.

A simple way to provide this failover would be to simply call my preferred one first and if that failed to give me valid data I would use the alternative as a backup. Easy enough to hack in php as both sites offered a JSON feed:

function URLopen($url) 
{ 
	// Fake the browser type
	ini_set('user_agent','MSIE 4\.0b2;');
	$dh = fopen("$url",'rb'); 
	$result = stream_get_contents($dh);
	fclose($dh);
	return $result; 
} 

global $json, $isJsonValid, $jsonSource;

// Indicates if we successfully parsed the json data
$isJsonValid = false;

// First try the bike-stats stuff (this might be slow or offline)
$rawjson = URLopen("http://api.bike-stats.co.uk/service/rest/bikestats?format=json");
$json = json_decode($rawjson, true);
if( $json != null )
{		
	$isJsonValid = true;
	$jsonSource = "api.bike-stats.co.uk";
}

// As a backup use the heroku json site	
if( !$isJsonValid )
{
	$rawjson = URLopen("http://borisapi.heroku.com/stations.json");
	$json = json_decode($rawjson, true);
		
	if( $json != null )
	{			
		$isJsonValid = true;
		$jsonSource = "borisapi.heroku.com";
	}
}

Now with my super basic failover my problem became that these two providers, although scraping the same data, present the JSON data in slightly different formats.

Going back to my original page I decided to normalise the json data that was used and simply have my failover code perform a quick transformation of the data to my new normalised format. So my final improved data sourcing with failover resiliency and normalisation (wow quite big words there...) ended up looking like so (sorry for the indentation):

try {
$rawjson = URLopen("http://api.bike-stats.co.uk/service/rest/bikestats?format=json");
$jsont = json_decode($rawjson, true);
if( $jsont != null ){		
	$json = array();
	foreach( $jsont["dockStation"] as $station){
		$json[] = array(
		"id"=> $station["@ID"],
		"name" => $station["name"], 
		"bikes" => $station["bikesAvailable"], 
		"docks" => $station["emptySlots"],
		"lat" =>  $station["latitude"],
		"lon" =>  $station["longitude"]
		);
	}
	$isJsonValid = true;
	$jsonSource = "api.bike-stats.co.uk";
}
}catch (Exception $e){
	$isJsonValid = false;
	$json = null;
}

// As a backup use the heroku json site	
if( !$isJsonValid ){
try{
$rawjson = URLopen("http://borisapi.heroku.com/stations.json");
$jsont = json_decode($rawjson, true);
if( $jsont != null ){
	$json = array();
	foreach( $jsont as $station){
		$json[] = array(
		"id"=> $station["id"],
		"name" => $station["name"], 
		"bikes" => $station["nb_bikes"], 
		"docks" => $station["nb_empty_docks"],
		"lat" =>  $station["lat"],
		"lon" =>  $station["long"]
		);
	}
	$isJsonValid = true;
	$jsonSource = "borisapi.heroku.com";
}
}catch( Exception $e ){
	$isJsonValid = false;
	$json = null;
}}
It is ugly sure, but does a marvelous job getting me the information I need :)

Want your own map?

A little while ago I designed a simple customization page that anyone can use to create their own light-weight bike stats web page. These pages are best viewed on a mobile device (such as a phone).

Get your own right here :)

Happy cycling...

20 April 2013

Layouts in Android not switching correctly between portrait and landscape

After having followed all the examples on the Android developer site to the letter about supplying multiple layouts for different screen sizes and orientation I was still having issues with them on extra large screens (10" tablets).

Basically the layout would not load when I rotated the tablet while the application was running.

If I started the application in portrait then that layout was correctly selected and when I started the app in landscape that layout was likewise correctly loaded. But after one layout was loaded the orientation change did not trigger Android to switch automatically.

The culprit was in my AndroidManifest.xml file. For some reason when I first created my activity the configChanges attribute had been set automatically for me to:
android:configChanges="keyboardHidden|orientation|screenSize"
removing screenSize solved the issue:
android:configChanges="keyboardHidden|orientation"

Obvious eh...

Edit: This behavior is actually documented in the android development guide.

19 April 2013

Fast Flicking Between Images on Android

So I wanted to be able to have a simple control that supported swapping between a list of images as the user swiped/flicked their finger across the image.

My initial solution was to have a HorizontalScrollView which showed all my images in a row. This worked pretty well but I wasn't able to accurately stop to show each image on the screen as the user flicked between them. They always seemed to end up half off-screen.

After spending a few hours attempting to wrestle the scrolling events in the HorizontalScrollViewer under my control I realised that I must be doing something wrong, surely this isn't this difficult.

ViewPager

And yes, it couldn't have been simpler. More information on the android blog. But basically I ended up creating a simple PagerAdapter like so:

public class ImagePagerAdapter extends PagerAdapter 
{
 Bitmap[] _images = null;

 public ImagePagerAdapter( Bitmap[] images ) 
 {
  _images = images;
 }

 @Override
 public int getCount() 
 {
  return _images == null ? 0 : _images.length;
 }

 @Override
 public boolean isViewFromObject(View view, Object object) 
 {
  return view == ((ImageView)object);
 }
 
 @Override
    public Object instantiateItem(ViewGroup container, int position) 
 {
  ImageView img = null;
  
  if( position >= 0 && position < _images.length )
  {
   final Bitmap bmp = _images[position];
   
   img = new ImageView(container.getContext());
   img.setLayoutParams(new LayoutParams(LayoutParams.MATCH_PARENT, LayoutParams.MATCH_PARENT));
   img.setImageBitmap(bmp);
   img.setScaleType(ImageView.ScaleType.CENTER_CROP);
   ((ViewPager)container).addView(img, 0);
  }
  
  return img;
    }

    @Override
    public void destroyItem(ViewGroup container, int position, Object object) 
    {
     ((ViewPager)container).removeView((ImageView)object);
    }

}

Then in my Activity/Fragment class it was a simple matter of just assigning the adapter to my pager and that is it!

ImagePagerAdapter adapter = new ImagePagerAdapter(bitmaps);
((ViewPager)v.findViewById(R.id.fragment_bird_banner_viewpager)).setAdapter(adapter);

Lesson: If things that you feel should be basic are becoming too complicated and taking too much time. You're probably doing them wrong. Step back and search/read a little bit. :)

13 April 2013

Is my mobile app really free when it is serving ads?

I use this rather handy app, BaconReader from http://onelouder.com/ to browse Reddit.com.

I connected my tablet to my computer and turned on LogCat as I was about to start debugging my own little app. Instead two little log lines caught my eye:

D/baconreader(25507): Message Service Started
D/baconreader(25507): Checking for messages

That was something I thought shouldn't be running at all, I launched the BaconReader app to double-check that all settings related to notification and background activity for that app were indeed turned off  or set to manual (I do this in effort to conserve battery power). They were all off but still the app was periodically starting up a service to check for messages (thanks for that).

While doing this I glimpsed how much excess network activity the application was performing while I was under the impression that it was "idle" (i.e. I wasn't interacting with it).

Usage and User Profiling

The application is using what seems two different platforms to track users and usage patterns Flurry and Google Analytics. Fair enough, nothing too fishy about that I guess. Both services seemed to communicate very sparingly with the server with only an initial message sent when the application started up and then stayed silent while I let the app sit "idle".

Advertisements

I have no problems with apps that serve ads in exchange for me being able to use them for free. What I found curious was that this app seems to be contacting three different ad services with varying levels of details about both me and my device.

Millennial Media
This component, initiated a HTTP request twice every second, each time with a very verbose URL. Below is an example of the query parameters for one request:

accelerometer:true
adtype:MMBannerAdTop
ar:manual
bl:46
cachedvideo:false
conn:wifi
country:GB
density:2.0
dm:Nexus 10
dv:Android4.2.2
hdid:mmh_94888091071502DC8F18CE0A08CBFA79_6AC9AD8BDA218890306A1DAF963D603E089F82C9
height:53
hpx:2464
hsht:53
hswd:320
language:en
loc:false
mmdid:mmh_94888091071502DC8F18CE0A08CBFA79_6AC9AD8BDA218890306A1DAF963D603E089F82C9
mmisdk:4.6.0-12.07.16.a
pkid:com.onelouder.baconreader
pknm:BaconReader
plugged:true
reqtype:getad
sdkapid:62626
space:16810389504
ua:Mozilla/5.0 (Linux; U; Android 4.2.2; en-gb; Nexus 10 Build/JDQ39) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30Nexus 10
video:true
width:320
wpx:1

I'd love to know the reason why they need to know why my device is plugged in or not, how much space is free on it and what connection method I'm using to connect to the internet. Seems a little much but still nothing surprising and unexpected from ad requests. The standard tracking-cookie for my device is even included in the hdid variable (I assume) but no GPS information was transmitted (even though GPS was enabled for apps on my tablet).

Google AdMob
Pretty standard AdMob (old doubleclick.net) requests really. Nothing unexpected here, they even try to minimise the amount of characters traveling over the network. How nice of them! Still they sent two requests every second. Below is an example of their querystring:

preqs:0
session_id:6944941379333807684
u_sd:2
seq_num:1
u_w:800
msid:com.onelouder.baconreader
js:afma-sdk-a-v6.2.1
mv:8016014.com.android.vending
isu:94888091071502DC8F18CE0A08CBFA79
cipa:1
bas_off:0
format:320x50_mb
oar:0
net:wi
app_name:38.android.com.onelouder.baconreader
hl:en
gnt:0
u_h:1232
carrier:
bas_on:0
ptime:0
u_audio:1
u_so:p
output:html
region:mobile_app
u_tz:60
client_sdk:1
ex:1
slotname:a14eb80d44ba957
caps:inlineVideo_interactiveVideo_mraid1_th_autoplay_mediation_sdkAdmobApiForAds_di
jsv:46
This by far was the worst offender. AdMarvel was a small mobile ad startup that was recently (in 2010) bought by Opera Software. The device performed 4 requests every second (where two requests were identical except for one querystring variable had changed, that is retrynum had been incremented from 0 to 1). Below is an example of one of their query string:

site_id:23206
partner_id:7b862efbe6c75952
timeout:5000
version:1.5
language:java
format:android
sdk_version:2.3.7.1
sdk_version_date:2013-02-04
sdk_supported:_admob_millennial_amazon
device_model:Nexus 10
device_name:JDQ39
device_systemversion:4.2.2
retrynum:0
excluded_banners:
device_orientation:portrait
device_connectivity:wifi
resolution_width:1600
max_image_width:1600
resolution_height:2464
max_image_height:2464
device_density:2.0
device_os:Android
adtype:banner
device_details:brand:google,model:Nexus 10,width:1600,height:2464,os:4.2.2,ua:Mozilla/5.0 (Linux; U; Android 4.2.2; en-gb; Nexus 10 Build/JDQ39) AppleWebKit/525.10+ (KHTML, like Gecko) Version/3.0.4 Mobile Safari/523.12.2
hardware_accelerated:true
target_params:UNIQUE_ID=>f780070d9537897a||GEOLOCATION=>59.4954954894955%2C-0.99682319125740403||bucket=>4||subreditname=>Front+Page||RTBID=>FBATTRID%3Ae939e527-35db-4e98-8d73-e05fead999b1||APP_VERSION=>2.8.1||RESPONSE_TYPE=>xml_with_xhtml||BNG=>0

What I find worrying here is (besides the size of the request) their final target_params variable which not only includes the ID for my device and exactly what part of the application I am viewing but also a pretty accurate GPS location (disabling GPS access for all apps got rid of that) and what looks to be Facebook related FBATTRID tag to serve ads from Facebook's MoPub Marketplace.

Talk about being tracked between devices.

Conclusion

All of this activity was happening multiple times a second for all three advertising services on my device. So six times a second my device was communicating tracking information and requesting advertisements back.

One of the most battery draining activities is transmitting data through the air on a mobile device. Based on these three services it seems like the industry standard is to query twice every second which to me feels overly aggressive and costly on my battery charge.

So what are we really getting for free? How much money are we spending on electricity to power our devices that is then used directly to serve us advertisements?

Perhaps I should just buy the thing, probably cheaper in the long run!