Predict.ly: October 2009

Saturday, 31 October 2009

Servers

Varnish is installed, nginx is back.

Need to configure varnish but that can wait.

varnishstat varnishtop - very nice.

The haproxy check is hitting varnish and getting a 0.9828 cache hit - ok easy when only 1 file is active but its working!

Thursday, 29 October 2009

Architecture

New plan, save me reinventing the caching wheel.

haproxy-->POST DATA---------------------------> apache
haproxy-->gzip enabled client --> nginx -> varnish -> apache
haproxy-->no gzip----------------> varnish -> apache
haproxy-->static--------> nginx

Varnish is a cool caching proxy that understands how to stitch together parts of documents. It's basically SSI on steroids and will handle all caching of page content. The cool thing is I can just add another node with vanish on it, et voila, instance cache. No need to pre-generate etc.

The only downside is that it doesn't produce gzip content (in the way it'll be used in this instance). Thats why nginx will be infront of it, purely to gzip its output.
Lighttpd is getting scrapped again for nginx because I wont be using SSI and it'll push static content bypassing vanish, why cache static stuff?

On the mysql front, I'm looking at DRBD as an initial alternative to replication. This a bit like RAID 1 over a network with failover. It's different to master/master replication in that you can't use both masters concurrently.

A very similar scheme is used on our massive db in work. This one is quite complicated to setup, so it might not happen for now - I will probably give it a go and see how I get on.

Tuesday, 27 October 2009

Sharding Keys 2

Slept on it and I'm happy with the keys.

I'm shying away from an earlier idea of precaching everything on the site and using SSI. The delay is just too big and I'm also concerned about disk IO.

I want votes and other things to be instant.

Questions

Do you allow non members to vote? (keep a tally of members vs visitors voting)

Is it acceptable to show the person voting/etc the update then renew cache to population slightly later?
- if so, what delay?

If I monitored what was active I could change cache procedures dynamically, heavy write frequency reduces the advantage of caching.

That sounds almost inteligent.

Monday, 26 October 2009

Sharding Keys

I've been going over the problem of keys for sharding all day.

I've come full circle and realised auto-increment works perfectly fine.

The trick is that as everything is sharded on user, as long as the userID is present with the items auto-increment, it will work without an issue.

Lets take Jonno prediction 500

Jonno@prediction@500 - seems ok to me

Wherever jonno goes in terms of db shards, his 500th prediction will follow.

This essentially boils down to: find where jonno is, then take prediction 500 from there.

If on the other hand you give each prediction a unique ID, and then try to find it, you'd need to look it up or look in all shards.

This ID will also change anytime more shards are made, thus all urls would break.

In the shard itself using Jonno as the key everywhere isn't so great. Instead its translated to an int, and then works as per usual inside the shard. All cross shard queries would revert to the char username.

Sunday, 25 October 2009

Sharding

Sharding is nearly every web 2.0 sites answer to database scalability.

The advice is to design to allow sharding.

If/when predict.ly becomes immensly popular, data should be sharded.

However, I'd really like to play sharding now. Therefore I've devised (after quite some hours) a scheme to shard the data. Based on what I've read it is fundamentally the web 2.0 scheme of sharding by user ID.

Everything on the site is created/owned by a user (even system users). All keys will be a combination of user/item id. Mysql autonumbers are not really viable anymore so a basic sequence implimentation is needed.

I've gone over the scheme 20 times and I'm fairly confident. The problem will be getting zend framework/doctrine to deal with the sharding at the application level. This looks very useful on dealing with it in doctrine/zend: http://blog.routydesign.com/?p=62

The other issue is - I don't have the hardware to run sharded databases. I only have 2 database servers. However the shards will be placed in differing database names on the same servers. No cross database queries are allowed (not sure if mysql even supports them to be honest).

This will to all intense purposes create an application level that is using a sharded database. When the database actually needs to scale, I will migrate the shards out initially to 2 additional databases which will be a very natural migration as the whole databases can be moved in a single action.

To reduce the impact of this admitidly too early optimisation there will only be 2 shards. The stages that require to query multiple shards will be doubling up on work, but thats the price of sharding.

The biggest potential penalties are tags and comments.

Tags
Initially the tag table will be in a database of it's own with no joins. Each prediction is limited to 5 tags - so a simple select with 5 primary keys to get the tags for a prediction. Getting tags associated with a user will be similar user > predictions > select tags by primary key. If the tags database has to be sharded, then multiple queries will be required.

The problem becomes finding out what tags are the most common. Each user shard will need to queried and then an aggregate produced. Luckily this is not an operation that is required frequently.

Comments
Loading the comments is simple enough because the comments for a prediction are held in the same shard, simple join. Each comment has an author, and we need to know some detail about the author: name/rank/avatar/etc.

This is a killer. It's fine when there are 2 shards. You have at worst 2 queries per prediction view for the comments. However as user is the primary shard mechanism, lets say it went to 26 shards - that's 26 queries just for the comments.

All advice talks about denormalising the data, but if that users rank/avatar change then you'd need to make the update potentially on all 26 shards. Now that I think about it, maybe that isn't too painful? How often would the image url/rank change? Seldom writes compared to heavy reads. Denormalisation might just be the key.

I had toyed with the idea of recording comments/vibes asynchronously however I'm now of the opinion it would damage usability too much. Instead the user posting the vibe/comment will see the change immediately - other users will experience a delay as the cache refreshes however they won't actually know it's delayed. I think the lesson is to show change to the person making the change to confirm it has occurred happily. Everyone else can wait a little while.

3 Weeks old!

The idea for Predict.ly is 3 weeks old today!

Progress has been pretty good for a part time project I think. Servers are in a bit of a mess again, but nothing major - I turned off lighttpd temporarily until I can sort out the log rotations.

Concept is now pretty solid, db is looking good, code is started, pretty bits of the design is started - not bad.

DNS has been switched over to the new webhost and I've started migrating all my other domains too.

Unfortunately my main email has been nocked offline due to old host being useless! Hope to have it back within 48 hours.

Saturday, 24 October 2009

Replicated Sessions

Replicated sessions up and working, one small step and all that.

The issue of friends/limited visibility of predictions is bothering me.

OpenID has no notion of friends. Therefore using OpenID does not allow the import of friends from other sites.

OAuth on the other hand, does allow for the import of some friends data but as far as I can this isn't very standard and custom work would be required per provider.

Need to read more, but it is beginning to look like a bit of a pain in the bottom.

Progress

Found a very efficient solution for views, vibes and votes.

Efficient and solid, feel better.

Friday, 23 October 2009

Thank you

Why PHP why?

The design of PHP frameworks is frickin painful.

Why are they so against autocomplete?

private function getForm()
{
$form = new Zend_Form();
$form->addElement('text', 'name', array(
'label' => 'Your name',
'required' => true
));
$form->addElement('textarea', 'message', array(
'label' => 'Message',
'required' => true,
'rows' => 4
));
$form->addElement('submit', 'send');
return $form;
}

That is just wrong. Wrong! Wrong!

I'm seriously thinking of pulling out my year old unmaintained mini framework or going to Java. This is just not funny.

Hack Day 2

I have picked php as the main language for the website front end. This will handle forms and what not.

Java will probably be used for non-frontend facing bits and bobs. Cache regen, twitter bridge, Google wave bridge, etc.

If I want a msn bridge it'll need to be C#.

I've picked zend framework to do most of the php work by the scientific method of spinning a bottle. Only to be horrified that it doesn't generate models from the db! A quick google later and doctrine saves the day

The doctrine generated output is a bit hit and miss. Mostly miss. I wonder if there is a way to set how it does it.

What kind of sick mind didn't include auto generation in zend, freaks.

SSI

Nginx SSI doesn't support the last modified header which is a pain in the bottom.

Lighttpd and apache both support it in the manner I desire.

Looks like Lighttpd is back and nginx is out.

Concept Change

Speaking to a couple of people it seemed clear that a lot of folks didn't really see the point in the idea. Maybe everyone isn't as pedantic as myself.

This got me wondering about the barrier to entry. I had already planned to fully use open ID so that the signup process would be virtually eliminated. I then got to thinking that maybe the act of filling in the basic form for making a prediction would reduce the number of users.

I think the PoLR guys suggested being able to make a prediction from twitter. I really liked this idea. I started playing about with various possibilities and it looks like users will be able to make predictions from a whole set of different sources:

Twitter - requires no initial visit to site
Google Wave - requires no initial visit to site (need to double check google wave auth system)
Blog (any platform) - requires 1 initial visit to site to register blog.

Facebook - Looks like it should be possible, still to be confirmed.
Msn, and others - still investigating.
Forums - no feasible method found, yet.
Email - has huge complications, not sure it will be feasible in the short term.

Other issues raised

Visibility: can a prediction be limited to just friends/specific users, or is it public?
Group Predictions: can a prediction be owned by more than one person?

Wednesday, 21 October 2009

This does exist already!

In a previous post I said

Is it just me or wouldn't it be nice of web servers were capable of collating document fragments and sending them as a single document?

Well surprise, it does exist! SSI! Now thats something I haven't heard in YEARS! Indeed when I started web dev in 19-coff-sumthin-coff the initial debate was about getting rid of SSI and using PHP!

This feels weird, it's like i've gone full circle.

Just testing something

I predict that I will be cracking open the bicardi tonight.

Predictly Seal

Victory Lap

Heartbeat reinstalled perfectly taking up my previous configuration.

Mon had various issues and I gave up. Have used a simple bash script to monitor haproxy and shutdown heartbeat if haproxy process is not running or if it is not responding to http requests.

Ran various test and they all passed. The failover solution is slightly slower than mon, however I could run the script as a daemon and reduce the delay however for now I can live with the 1 minute of downtime.

I'm beginning to understand the beauty of the linux ethos of do one thing well.

Tuesday, 20 October 2009

Tired.

Keepalived has beaten me. I've gone back to heartbeat and looking at mon.

Monday, 19 October 2009

Haproxy

The version of Haproxy I have has issues, need to consider a manual upgrade.

Additionally, stupidly, I have nothing to deal with haproxy itself failing as I found out tonight. Need to install keepalived for failover.

Logo 2?

What ya think?

Site Design

Just spoke to Lynne @ PoLR and she is going to try and squeeze me in - if you are in the UK and are looking for design work or SEO - you won't find better than these guys.

If you are involved in e-commerce in anyway check out this guide for christmas.

I don't think she was impressed with my ms paint logo -

Sunday, 18 October 2009

No more www.

Added a rewrite to nginx to remove www. from requests - this has been bugging me for days.

DNS

Added new DNS entries for various elements.

Part of this is due to the advice from yahoo about having images/static assets load from more than one domain as this will allow them to load in parallel in most browsers - now have static1.predict.ly and static2.predict.ly for this reason.

vote.my.predict.ly - will handle all votes
comment.my.predict.ly - will handle all comments
my.predict.ly - will handle all other dynamic aspects such as login, authoring, dashboard, account settings, etc.

Cookies will flow form my.predict.ly to vote and comment automagically, so once authenticated vote/comment will know it.

Those cookies wont be sent to the main site or for static assets which saves a little bit of bandwidth.

I feel more comfortable with the subdomains in place, it will if required make things a lot easier to scale without relying on a single set of load balancers. Each domain could easily have its own set.

I must admit it took heavy head scratching to get vhosts setup everywhere.

Email

Out going email is now routed via Gmail tested from command line and php all is well in the world.

YSlow

I just did a simple test with YSlow on the predict.ly holding page.

It highlighted that nginx is not sending expires headers on images. Found the following to fix:

# serve static files directly
location ~* ^.+.(jpgjpeggifcsspngjsicohtml)$
{
access_log off;
expires 30d;
}

It also highlighted I don't have a favicon.

Rsync

Set up a couple of users to handle rsync of cache files between the servers and got it working with ssh etc.

If the cache has a structure similar to:

/vo/1/2/3/4/1234001/files

Running rsync with just /vo/ might be quite painful if it has to traverse everything to work out what has changed. Perhaps I will need to look at triggering rsync as soon as a new set of files has been created at running it at the level of: /vo/1/2/3/4/1234001/

That should decrease the time taken for cache to be available on both servers and reduce traversal overhead.

Saturday, 17 October 2009

Hack Day 1 - Progress

It has been a fairly short hack day because I've been asked to head out for dinner and fun stuff.

Database draft is complete although I'm not happy with the Merritt setup and will come back to that tomorrow most likely. I want merits to be replaced not just added continually, I think I'm modelling it wrong, now that I write about it the solution has appeared in my brain and it's much simpler than what I have.

Added friend concept, organised multiple authentication origins into a basic structure, prediction tables look fairly good as does comments/votes.

My mysql replication has broke because I changed the database name, it will be an interesting task resolving this - my first mysql replication problem. Based on this I'm going to add replication monitoring to the haproxy setup so that a stale server is not used.

Redis is not and will never be available for windows.

Desired Webserver Ability

Is it just me or wouldn't it be nice of web servers were capable of collating document fragments and sending them as a single document?

Example:

Dir:
/doc1.html.1
/doc1.html.2
/doc1.html.3

Request: blah.com/doc1.html

Deliver: The combination of the three files in order and use the most recent file time as the last modified header.

Sure you can do it in language X, but wouldn't it be nice for such a common scenario to just work out of the box?

Hack Day 1

Hack day 1 has arrived, time to start building the application code.

I bought something of a monster pc setup recently and until now I haven't had a single programming language or IDE installed. Today I'm installing wamp (although I already have master/master mysql setup locally). I need to see if Redis installs on windows for local development.

Today I want to get the majority of the database design completed and start on the use cases.

Friday, 16 October 2009

MySQL Load Balancing

I've setup mysql in haproxy so that both servers are checked every 2 seconds for availability. I think for now in the application side I will attempt to connect to the local instance and if that fails connect to haproxy (which will effectively be the other instance at this point).

I'm not sure how this will be extended in the future, however this allows options for most paths and I can see mysql stats along with the other servers in one place.

Setup Continues

Apache has been stripped down to the basic, nearly all modules have been removed except PHP5 which has reduced the memory usage significantly.

I have decided to use nginx to serve static content and this has now been setup on both linodes. I have worked out how to deal with both servers generating static files and how to sync them between the two linodes.

Redis will be used to monitor the most active elements of the website. A key will be created for each item ID accessed with an expire time, if another request is made for the same item within X minutes, the key value will be incremented like a hit counter and the expire time renewed. This will provide very efficient monitoring of the most active items on the site at all times.

The redis activity monitor will initially be used to populate a 'whats hot' view on the website but also to increase the rate at which active items are recached.

Additionally I've changed all ssh logins to public/private key to remove password vulnerabilities.
I got a tip from the very helpful guys in the linode.com IRC chat about cache files and syncing. They recommended writing the cache to a temp file name then renaming rather than copying - this is very efficient all round.

Firewalls have been configured, everything except http and ssh has been locked out on the public interfaces. Private interfaces allow a little more, but not much.

Friday, 9 October 2009

It's alive!

HAProxy is installed and balancing the load between servers. The monitoring is very very good and very quick.

Apache and php5 are installed, I might play with lighty another time but there are so many things to configure and well apache is the enemy you know. I will be stripping it down to the bare essentials shortly, it's a stock install currently.

Configuration

I picked a host and setup 2 server instances. I really like this host. Their control panel is well thought out and I was up and running with my own choice of linux in minutes.

The host had advised going with heartbeat instead of wackamole which I wasn't keen on because I'm a strong fan of Theo and George from Omni IT who recommend wackamole. However, my linux skills are not the best on earth and heartbeat was available using apt-get.

There is a bug in the install available which took me hours to work out although I did feel like a proper linux geek when I did. There is a requirement to put your subnet and adaptor name in the haresource file.

We now have failover at the IP level for the two servers. This brings into question the use of DNS failover as realistically the IP failover is much faster and it's unlikely the DNS system will ever notice it. For the sake of a few dollars a year maybe it's better to leave that option open.

I had to purchase an additional domain to make it work but it took seconds with the nice control panel, it remembered my billing options and it even pro-rated the charge, so my $1 ip is only 75 cents as we are already part way into the month. You can't say fairer than that!

Heartbeat has the ability to call scripts on failover events and I should probably make use of this at some point. However I have nothing else running at the moment, the next decision is load balancing. The DNS load balancing is as moot a point as DNS failover, there is only one public IP so there is nothing to balance at the DNS level currently.

Lots to think about.

Thursday, 8 October 2009

Decisions

haproxy for load balancing and wackamole for server IP juggling seem to be winning the debate

Overall Architecture

I thought I'd write a bit about the planned architecture for predict.ly - or at least the plan so far.

DNS

The architecture starts at the DNS level, utilising IP Anycast for redundant DNS servers. This will be sourced from an existing vendor with good reviews - still to pick. Utilising round robin and monitoring the DNS system will distribute the traffic for each domain to the 2 servers providing basic traffic load balancing - or more correctly termed, load distribution.

The DNS service can determine if a server is offline and stop sending people to it within 5 minutes. I found some results posted about the effectiveness of this and they suggest 80% of users will fail over within an hour.

While this isn't bullet proof, it's better than nothing and combined with round robin DNS, only 50% of people would need to be fail-over so you would be operating at 90% at the dns level.

IP Level

At the IP level each server will monitor the other and if one becomes unavailable, the other will assume that IP address. This pretty much removes the need for the DNS failover however having it in place will allow for scaling later such as multi geographic location hosting.

This monitoring should be able to detect and recover an unserviced IP within 2 minutes.

Load Balancing

Still looking at solutions, however the likelihood is that this will be in the form of a http proxy distribution requests to http daemons on both servers. If the solutions merely distributed traffic without some intelligent balancing, then there might not be much point given the DNS balancing. However I am hoping to find something that will balance traffic out based on server load, opting for the least loaded.

MySQL

A basic master-master mysql setup using a meter free gigabit vpn should provide failover and high availability. Tests so far have been really encouraging.

Filesystem

I would really like a replicating filesystem but the setup looks like a nightmare. The advice seems to be go with rsync and that is probably what I will do.

Cache System

I am still looking into the cache system. It will likely be a mix of filesystem and memcached. Additionally there will be a small custom java server that will be used to aggregate writes to the database to reduce load.

Web Server

I'm looking at lighty for the web server running php with fastcgi. The question of op-code caches has yet to be resolved, I don't think I can afford the memory requirements.

Budget

Currently looking at $500 annually.

Mysql Master Master Replication

Just got master-master replication working in mysql. It has been surprisingly easy so far however there are various warnings about recovering from faults.

I ran some basic inserts, shut one of the servers down, ran some more inserts and restarted the server. Everything was synced and beautiful almost immediately.

I've decided to handle MySQL balancing in the app rather than using MySQL-proxy. Facebook do it this was and it makes sense to me. Feeling very pleased, it's nice when things just work.