Dec 21, 2009

Lessons from Facebook:

FB maybe is not the best example for quality design (probably each of us is likely to have at least a single error message a day).

However, since it's the 2nd largest site in the world with contsant exponential growth, it's a good lesson to have. Most interesting that their tech is team is only about 250 people and 30K servers, pretty amazing.

Not long ago a Jeff Rothschild, the Vice President of Technology at Facebook gave a presentation in UC San Diego. You can find a detailed summary by Prof. Amin Vahdat, but I'll make several top comments that I found useful.
  1. FB Software development: mostly PHP (complied), but other languages are used as well.
  2. Common interface between services using an internal development that was turned to open source: Thirft. This interface enables easy connections between the different languages.
  3. Logging Framework that is not dependent on a central repository or its availability. FB is using Hadoop and well as Hive (that was developed there as well). The log size is growing at a 25TB a day.
  4. Operational Monitoring: Separated from the logging mechanism 
  5. LAN can be a bottleneck as well: expect for packet loss and packet drops in the LAN if you stress it too much.
  6. CDN: Facebook is using external CDN to images distribution.
  7. Dedicated file system named Haystack that combines simple storage along with cache directory: file system is accessed only once to get images from it, while directory structure is retrieved from the cache.
  8. Most data is served from Memcached. Database is used mostly for Persistency and data replication between sites (Memcached is being heated by the MySQL itself):
    1. Top challenge: keeping data consistent since Memcached can be messed easily (No search for keys is available).
    2. Mixing information including sizes and types is better - making sure that load on CPU, Memory and etc is distributed equally.
  9. Shared nothing - Keep your system independent - avoid a single bottleneck. Therefore, data is saved in a sharded MySQL from day 1. However, MySQL is used mostly for data persistency and not for conservative database usage pattern:
    1. No Joins in the MySQL
    2. Chosen due to good data consistency + Management Software
    3. 4K servers
    4. Data replication between sites is based on MySQL replication
    5. Memcached is being heated based on the MySQL Replication using a custom API
One last thing, if you are interested in Facebook financials, as well the storage machines (NetApp low end 3070), sizing, traffic, servers, storage procurement and data center costs, take a look at TechCrunch's Michael Arrington post.

Keep Performing,
Moshe Kaplan. Performance Expert.

Dec 20, 2009

Expect for the Best. Be Prepared for the Worse.

A major issue in preparing for the disaster is getting clear visibility of your system. Especially if we are talking about distributed multi server and multi data center system.
Paul Venezia from InfoWorld provided a clear and concise description of the major open source tools (yes, free ones), that can give your clear visibility of your system.These tools can help you better know:
  1. When does your CPU utilization get close to 100%?
  2. When should you add more hardware or make an effort to boost your system performance?
  3. What are your high traffic sources?
  4. What is the root cause analysis for system poor performance?
  5. What are the long term treadlines?
 Among the described tools you can find:
  1.  Cacti - System Monitoring Web GUI that provides you graphs to every system metric that can be exposed in SNMP. Most recommended tool.
  2.  Nagios - Network Monitoring tool.
  3. NeDi - Physical network mapping tool (to which port your server is connected). Useful moslty if you are in the on-premise business rather than the cloud business.
  4. Ntop - Traffic analysis tool that can help find what are the traffic sources, and which one keeps you bill so high.
  5. Pancho - Configuring and backup Cisco routers. Again for the on-premise guys.
Using at least some of these tools you can better prepare for the disaster, and many times avoid it.

Keep Performing,
Moshe Kaplan. Performance Expert.

Dec 19, 2009

Load Balacing Support in Dynamic Environments


The Mission
These days we face a challenging task: designing a very large system of scalable instances. Each of these instances may be in a different geographic location, and many of them are on demand instances that are  being started and shutdown instantly.
Another requirement in this system that a given client will be direct to a defined instance, due to system restriction (a round robin is not an option is this case).

One Step Further
Since the number of IP addresses in the internet is limited, we would like to use as few as possible Public IP addresses. This can be done using a load balancer or a proxy.
In the current state we would like to avoid using hardware load balancers in order to keep initial fixed costs minimal, but we may consider to use them in the future.

Is Amazon Cloud Load Balancer Service (AWS) is an Option?

AWS EC2 instances is a feasible option for on demand instances hosing. However, AWS charges $0.025 per a single load balancing rule per hour (+traffic). Therefore, it can be used, but for a large number of rules (>7) or high traffic, better solutions can be found in the market.

So What Can Be Done?
We left with software load balancers. The major ones Apache mod_proxy and HAProxy. Supporting large number of instances behind the load balancer can be done in one of the following two options:
  1. Pre register a large number of DNS addresses (sub domain) and associate them with the load balacner IP. The load balancer will simply redirect the request to the defined instance, based on the a simple rule in the load balancer. For example:  Pros: simple. Cons: not fully dynamic, requires additional DNS registration once in a while to keep up with the application growth.
  2. Performing ProxyPass in the Load balancer: Every request will include in its path an instance identification for example: This method does not require mass DNS declarations, but it requires specific definitions in the load balancers that may be more CPU consuming. In Apache the definition is pretty trivial, however, this product is less scalable from HAProxy. In HAProxy the task can be done as well based on a two phases: switching to the server and rewrite the URI:
    In order to switch to the server, you have to use ACLs to match the path,
    then a use_backend directive to select a server farm ("backend"). Your
    farm may very well support only one server if you want.

    Then in this "backend", you can use a rewrite rule ("reqrep") to replace
    the request line.

    This would basically look like this :

    frontend xxx
           acl path_mirror_foo path_beg /mirror/foo/
           use_backend bk_66 if path_mirror_foo

    backend bk_66
           reqrep ^([^: ]*\ )/instN/\(.*\)  \1/\2
           balance roundrobin
           server srv66

However, Willy Tarreau, HAProxy author who kindly provided this hint for me, recommends that you avoid the second part (rewriting) because :

 1) it requires good regex skills which sometimes makes the configs hard
    to maintain for other people

 2) rewriting URIs in applications is the worst ever thing to do, because
    they never know where they are mapped, and regularly emit wrong links
    and wrong Location headers during redirections.

Willy Tarreau also advices that the best thing to do clearly is to correctly configure your application to be able to respond with the real, original URI. Remapping it can be used
as a transitional setup in order to ease a graceful switchover, though. Bottom line: Pros: No DNS configuration and fully scalable solution, with no dependence on DNS replication. Cons: CPU Consuming and error prone declarations.

So, What to Choose?
The answer is based on your needs, and your believe in your people regex capabilities. We made our choice.

Keep Performing,
Moshe Kaplan. Performance Expert.


Intense Debate Comments

Ratings and Recommendations