Dec 23, 2013

Can OpenStack Object Store be a Base for a Video CDN?

Video CDN are amazing from technical aspect as we are talking about high scale systems with some unique business cases.
I would like to share with you some design aspects regarding these systems.

The Video CDN Case Studies
Video CDN includes two main case studies:
  1. VOD Case Study: This is a long tail/high throughput scenario where you need a high capacity disks, where only a small portion of it will be used extensively. In order to create a cost effective solution you should have:
    1. High capacity storage system with low IOPS needs. Servers with 24-36 2-3TB SATA disks will provide a up to 100TB raw storage with a tag price of $15K.
    2. Replication and auto failover mechanism that can distribute content between several servers and can save us from using expensive RAID and Cluster solutions.
    3. Caching/Proxy mechanism that will serve the head of the long tail from the memory.
  2. Live Broadcast Case Study: This is a No Storage/high throughput scenario where you actually don't need any persistent storage (if a server fails, as soon as its get up again, the data will be no longer relevant). In order to create a cost effective solution you should have:
    1. No significant storage.
    2. High capacity RAM that should be sized according to:
      1. The number of channels you are going to serve.
      2. The number of resolutions your going to support (most relevant when you plan to support handhelds and not just widescreens).
      3. The amount of time you are going to store (no more that 5 min are needed in case of live, and no more than 4 hours in case of start over).
    3. In memory rapid replication (or ramdisk based) mechanism, that will replicate the incoming video to several machines.
    4. HTTP interface to serve video chunks to end users.
Serving Static Content rather than Dynamic
Modern Video Encoding systems (such as Google's Widevine) support "Encrypt Once, Use Many", where the content is encrypted once, and decryption keys are distributed to secured clients based on a need to know basis.

Why OpenStack Swift/Object Store?

In short, OpenStack Swift is the OSS equivalent for Amazon propriety AWS S3: "Simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web."

OpenStack Swift/Object Store Benefits
  1. Open stack is OSS and therefore easy to evaluate.
  2. Active large scale deployments including RackSpace and Comcast.
  3. Built in content distribution method that let you distribute the load between multiple servers based on rsync.
  4. High availability based on data replication between multiple instances. On a server failure, you just need to take it out of the array and replace with a new one, while other servers keep serving users.
  5. This mechanism help you avoid premium hardware such IO controllers and RAID mechanism.
  6. Built in HA and DRP based on 5 independent zones.
  7. Built in web server, that enables you serving static content as well as HTTP based video streams from the server itself, rather than implementing high end SAN.
  8. Built in reverse proxy service that minimizes IO and maximizes throughput, based on Python and Memcache.
  9. Built in authentication service
  10. Target pricing of $0.4 per 1M servings and $0.055/GB per month if we take AWS as a benchmark.
OpenStack Object Store Architecture
The OpenStack Object Store architecture is well described in two layers:
  1. The logical: Accounts (paying customers), Containers (folders) and Objects (blobs).
  2. The physical: Zones (Independent clusters), Partitions (of data items), Rings (mapping between URLs and partitions and locations on disks) and Proxies.
Key Concepts
  1. Account is actually an independent tenant, as it has its own data store (implemented by SQLite).
  2. Replication is done based on large blocks, quorum and MD5.
  3. Write to Disk before Expose to Users: When files are uploaded, they are first committed to disk at least at two zones, and then database is updated for availability (so don't expect sub second response for a write).
System Sizing 
Before testing the system you should be familiar with some key sizing metrics obtained by Korea Telecom:

  1. Object Store Server Sizing: High capacity storage (36-48 2-3TB SATA drivers that will provide us up to 100TB per server), memory to cache the head of the long tail (24-48GB RAM), 2x1Gbps Eth to support ~500 concurrent requests for long tail request. A single high end CPU should do the work.
  2. Proxy Server Sizing: Little storage (500GB SATA disk will do the work), memory to cache the head of the long tail (24 RAM), 2x10Gbps Eth to support ~5000 concurrent requests to support the head of the long tail requests.
  3. Switches: you will need a nice backbone for this system. In order to avoid a backbone that is too large, splitting the system to several clusters is recommended.
  4. Load Balancing: in order to avoid high end LB, you should use DNS LB, where frequent calls to the DNS are neglect-able relatively to the Media streaming
The Fast Lane: How to Start?
OpenStack Object Store is probably cost effective when looking for large installations as you may need at least 5 physical servers for object and containers store and another 2 for proxies. 
However, you can check the solution based on a single server installation (SAIO):

If you take the fast lane and AWS is the fast lane to your POC, feel free to use the following tips:

  1. In the initial installation, some packages will miss in yum so:
    1. sudo easy_install eventlet
    2. sudo easy_install dnspython
    3. sudo easy_install netifaces
    4. sudo easy_install pastedeploy
  2. No need to start the rsync service (just reboot the machine).
  3. Start the service using sudo ./bin/startmain
  4. Test the service using  the supload bash script to stimulate a client.
Working with the Web Services
There are 3 main ways to work with Swift web services:
  1. AWS tools, as Swift is compliant with AWS S3.
  2. HTTP calls as it is based on HTTP.
  3. Swift client that streams your major needs.
Working with Swift Client
In the following example we assume a given user test.tester was defined with the password testing, and data is served over the proxy at port 8080

Get statistics:
swift -A -U test:tester -K testing stat

Upload a file to the videos container:
sudo swift -A -U test:tester -K testing upload videos ./demo.wvm

Provide read only permissions (Please notice that r: is stands to referral domain, so specifying any other * can help you save bandwidth and minimize content stealing):
sudo swift -A -U test:tester -K testing post videos  -r '.r:*'

Download the file (where AUTH_test is the user account, videos is the containter and anonymous access was provided as detailed below):

Public Access
In order to implement a read only public access your will need to take care of the following items
  1. User Management
  2. Define anonymous accessdelay_auth_decision = 1
  3. Configure folder ACL
  4. Enable delayed authentication at the proxy configuration (/etc/swift/proxy-server.conf):
paste.filter_factory = keystone.middleware.auth_token:filter_factory
# Delaying the auth decision is required to support token-less
# usage for anonymous referrers (‘.r:*’).
delay_auth_decision = 1

Working with Direct HTTP Calls
Get user/pwd
curl -v -H 'X-Storage-User: test:tester' -H 'X-Storage-Pass: testing'

> X-Storage-Url:
> X-Auth-Token: AUTH_tk551e69a150f4439abf6789409f98a047
> Content-Type: text/html; charset=UTF-8
> X-Storage-Token: AUTH_tk551e69a150f4439abf6789409f98a047

Upload file
curl –X PUT -i \
    -H "X-Auth-Token: AUTH_tk26748f1d294343eab28d882a61395f2d" \
    -T /tmp/a.txt \

Bottom Line
OpenStack Object Store (Swift) is exciting tool for anyone who is working with large scale system and especially when talking about CDNs. 

Keep Performing,
Moshe Kaplan

Dec 14, 2013

Do You Really Need NoSQL and Big Data Solutions?

Big Data and NoSQL are the biggest buzz around...
Yet, are they the right solutions for your project?

Are You Eligible for Big Data?
As a rule of thumb, if your database is smaller than 100GB and your biggest table is less than 100M rows, you should avoid seeking Big Data solutions. In that case, make sure you well utilize your current RDBMS investments.

When Should You Choose RDBMS?
There are several other reasons to stick with RDBMS (yes, we are talking about MySQL, SQL Server, Oracle and other):
  1. You must meet compliance and security procedures (cases: PCI compliance).
  2. You need complex reporting based on joins between several tables.
  3. You need transactions (cases: financial transactions).
  4. You have established data analysts group that cannot be trained to other syntax.
When NoSQL Solutions Should Fit?
There are several high reasons to select NoSQL solution. Check if your case is eligible for it:

  1. You are a full stack developer and just look for a persistence storage (cases: blogging system, multi choice exams).
  2. You need a quick response from your storage solution based on a key value store (cases: algo trading and online stock exchanges bidding: DSP).
  3. You must always return an answer, even if it's not the most updated one (cases: social networks, content management).
  4. You need to provide a good enough answer rather the most accurate (cases: search engine).
  5. Your data size is too big to be transferred over network (even over a 80Gbps Infiniband). In this cases a better approach is distributing the computation (cases: analytics, statistics).

Bottom Line
NoSQL solutions have expanded your toolbox. Now, you need to focus on selecting the right tool for your business case.

Keep Performing,

Moshe Kaplan

Dec 9, 2013

JAVA Production Systems Profiling Done Right!

If you are facing a Java system performance issue in production, and JProfiler is not the right tool for it, probably JMX monitoring using the VisualVM will do the work for you.

JMX usage from a remote machine can be frustrating. Therefore, please make sure that:
  1. Your hostname is included in the /etc/hosts 
    1. Get host name using hostname 
    2. Add the host name after in /etc/hosts
  2. JMX is binded to the external IP:
    1. Verify is not presented at: netstat -na | grep 1099
    2. If it does presented, add to your java command: -Djava.rmi.server.hostname=
If everything is Okay, you will be able to run VisualVM from a remote machine and connect to the remote server.

Now, that you have your VisualVM up and running there are some items you should take a look at:
  1. General CPU and memory graphs.
  2. Sampler that enables you taking snapshots.
  3. Snapshot analysis that enables you a hotspot presentation as well as deep.
Bottom Lin
My recommendation is to have snapshot of the process and then look at the hotspots tab for major calls with actual long CPU time. You should focus on these items.

Keep Performing,

Dec 3, 2013

Is There a (Good) Solution for SQL Server HA @ Azure?

Comment: I do not see Windows Azure SQL Database as a feasible solution for a firm that expects its business to scale. The reason is simple: You can not use a component in your system that its replacement will require a long downtime (yes, we are talking about hours if you will have a significant database size). The only way to migrate from Windows Azure SQL Database is to export its data and import it on a regular instance, and it's not acceptable when you have a significant traffic.

High Availability
The requirement for high availability is common: you don't want downtime, as downtime mean less business and it hurts your business image. 

The Azure Catch
Azure SQL Server VM is just like having a SQL Server on a regular VM. 
VM maintenance includes two layers: 1) maintaining the VM (installing patches, hardening...) and 2) doing the same to the host underneath.  
A common large size private and public cloud operation usually include an auto fail over, so when a host is having maintenance or unfortunately fails, the system automatically migrate the running VMs to another host(s) w/o stopping them. You could find this behavior in VMWare VMotion and at Amazon EC2 that runs over XEN.
Well... this is not the case at Microsoft Azure. When  Microsoft updates its hosts, don't expect your instances to be available (and yes the downtime may take dozens of minutes and it is not controlled by you). This is an acceptable practice when dealing with web and application servers (place several instances behind a LB and use a queue mechanism to deal with it). However, it is not good one when you deal with databases it's can be a major issue.

The Solution: Have a Master-Master Configuration
As you may understood, a master-slave solution is not acceptable in this case, and therefore, you will need to avoid Log Shipping (although it can be used various other scenarios).
Therefore, we were left with two solutions:

This was a solution for HA architectures.
However "This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature. Use AlwaysOn Availability Groups instead."

AlwaysOn Availability Groups
This solution is described by MS as the "enterprise-level alternative to database mirroring. Introduced in SQL Server 2012".
However, "The non-RFC-compliant DHCP service in Windows Azure can cause the creation of certain WSFC cluster configurations to fail, due to the cluster network name being assigned a duplicate IP address (the same IP address as one of the cluster nodes). This is an issue when you implement AlwaysOn Availability Groups, which depends on the WSFC feature."

The Second Catch
According to our analysis it seems that both Mirroring (end of life) and AlwaysOn (severe bugs due to DHCP) are not recommended, so we actually left w/o good HA solution, and therefore with no good MS data store solution for the Azure environment.
We tried to get answers from Microsoft stuff, but we did not get good ones.

Bottom Line
When evaluating Azure as a cloud platform for your needs, you should consider your data solution and how it fits your needs. In this case you may need to consider some open source solutions such as MySQL, Cassandra and MongoDB on a Linux VM, instead of going the MS default stack.

Keep Performing,
Moshe Kaplan


Intense Debate Comments

Ratings and Recommendations