Feb 21, 2010

Lecture: Extract The Traffic from the DB

A few days ago I had a presentation in the AlphaGeeks meetup in Tel Aviv, presenting NoSQL, Memcached, CouchDB, Sharding and other buzzwords that help you extract the traffic from the database and boost your system performance. If you were not there, you can take a look at the presentation (English) or the recorded video (Hebrew).

The Presentation (English)
The Video (Hebrew)

Keep Performing,
Moshe Kaplan

Feb 17, 2010

Agile Tools for Agile Performance

We invest these days in our team, turning it into Agile. This way we expect to bring sooner and better products to market.
We selected Agilo by Agile42 as out task, bug and Wiki product. This a Trac based product that has the following pros:
  1. It is based on Trac, so it includes all the common Trac features: road map, bugs, tasks and Wiki in a single product
  2. It is customized to Agile methodology including:
    1. White board (pro version)
    2. Sprints, Milestones
    3. Sprint Dashboard with Sprint Burndown, closure tickets rate and commitment charts
  3. It has better UI that Trac
  4. It has great packaging for instant installation (Trac instant installation can be find in BitNami).
  5. Its community version is free (it has pro version with several extra features such as white board)
 Some useful info if you turn to Agilo: 
  1. Agilo installation
  2. Installing Agilo as a Windows service
    1.  Download the Windows Server 2003 Resource Kit from Microsoft
    2.  Install the service according to MS
    3. Update:  Change in run.bat the set VIRTUAL_ENV=%cd% to set VIRTUAL_ENV=%Agilo%. Create the Agilo system variable with a value that matched the path where the run.bat file is located at.
  3.  Avoid errors
    1. "The password file could not be updated. Trac requires read and write access to both the password file and its parent directory": Change TrustedInstaller and Users permissions on the tracenv directory
    2. Got "acct_mgr.web_ui.MessageWrapper", well, open the trac.db and run DELETE FROM session_attribute to solve this issue.
  4. Control your source code
    1. If your SVN is not on the Trac/Agilo machine, you should use SVNSync to make a local SVN read only copy:
      svnsync synchronize http://localhost/svn/project --sync-username slaveuser --sync-password tjohej --source-password password
      c:\Python25\Scripts\trac-admin.exe c:\projects\trac\project\ resync
  5. Modifications
    1. Changing attached user files size for tickets and wiki (Update May 22, 2010) using the max_size parameter in trac.ini.
  6. Control your sprint
    1. You should add priority field to the task in order to support any prioritization
    2. You should add bug association with the sprint in order to see both in the same presentation
    3. Agilo seperates between bugs and tasks (however, you probably manage both in the same sprint), therefore we created a report that controls all issues:
More goodies to follow,

Keep Performing,
Moshe Kaplan

Feb 16, 2010

Lectute: Memcached?, SimpleDB? NoSQL?: how the big boys handle massive query loads with non-SQL solutions

What: 5th AlphaGeeks Meetup
Where: Tushia 10, Tel Aviv, Israel
When: Wednesday, February 17, 2010, 18:30-21:30

In the 1st AlphaGeeks meetup I presented the sharding concept and how can it help you meet the 1 billion events/day systems requirements. This time we'll talk about the new sexy and emerging technologies of No SQL and how can they help you meet these requirements.

Other lectures
- Amitay Dobo: On C# (4) and Mono.
Why C# is a kick ass language that can bridge traditional, functional and dynamic typing languages, and how it all works with the Mono project.
- Yuval Goldstein will conclude n international survey of 300 developers, about their jobs, their salaries, professionalism and overall happiness.

Unleash Your Cloud Load Stress Monster

A common question when you prepare your system for the slash dot effect is "How do I check that my system is capable to hold these numbers?"

Location. Location. Location.
You may have the following options:
  1. Buying a lot of hardware and setup a one time (or more) lab and check your system capabilities. Pros: its your own servers and you will always find something to do with these extra servers. Cons: It will burn your budget, keep your staff night and days to setup it and will require several days to months to get all installed.
  2. Rent a lab and do your things there: Pros: You really don't need to put that amount of money Cons: Schedule the lab, making sure that the hardware and networking meet your needs, making sure software licenses are available... Most important, if you find a major issues in the first day, you will have to close the lab and reschedule another test (and pay again).
  3. Setup a cloud based lab. Pros: No setup fees, no need to schedule, no need to commit, and you can save your environment, shut it down, and turn it on when you will need it again. Cons: You don't really own the servers, but hey, who really wants to own servers?
The Tools
OK, so we chose the cloud again. What about the tools? should we choose HP Software LoadRunner or Radview WebLOAD? If so get ready to write a 6 digits number check.
However, the smart choice is selecting the open source tool: Apache JMeter, that can generate HTTP stress (it's a world wide web world after all) at the price of $0. This tool requires you to build the stress scripts in a manual manner using drag and drop, parameters configuration and BASH scripts and it supports visualization using graphs and reports (it also support SOAP, HTTPS, LDAP, JMS, IMAP, JDBC...)
One last thing, JMeter support "bot network" mode, where several JMeter instances can load a single system and provide a unified reporting.

Decisions. Decisions. Decisions.
So we chosen the cloud environment (Amazon AWS currently provides the best offer) and JMeter... now just before launching instances, lets make several decisions that will help us keep costs as low as possible.

Getting Best Prices
  1. Windows or Linux: since JMeter is Java based, it's platform independent and Linux will be the smart choice.
  2. Spot prices: by using  spot request, you can save about 60% of your CPU cost, and 30% of the total cost.
  3. Install both stress loaders and the system in Amazon to avoid paying for traffic.

Stop talking. Start Working.

Lets start with several basic steps:
  1. Download Client Tools that will be used to connect using SSH from Windows host :
    1. Download WinSCP
    2. Download Putty 
    3. Download PuttyGen
  2. Sign up to Amazon AWS and prepare your user:
    1. Sign up
    2. Gen a KeyPair from the AWS management console (can be done using CLI if you prefer so).
    3. Download the KeyPair (PEM file) and create a private key (PPK file) that can be used by Putty and WinSCP:
      1. Open PuttyGen
      2. Conversions > Import Key to import your .PEM file
      3. Click on "Save private key" to create your private key file (PPK)
  3. Launch your instance and connect to it
    1. Launch an image from EBS based image (if you prefer to keep your work for next time). Use spot request to save some money. Please notice that the default Linux flavor is Fedura.
    2.  Connect using WinSCP and the PPK file. CLI should be done by starting Putty from within the WinSCP.
  4. Install JMeter and its dependencies:
    1. Download and install JMeter
      1. Download the JMeter tar file from the site using wget
      2. Unzip the file using tar -zxvf file.tar.gz
    2. Download and install Java
      1. Download java using wget from http://java.com/en/download/manual.jsp and install it
      2. Set X permissions on the Java: chmod a+x jre-6u-linux-i586.bin
      3. Run jre-6u18-linux-i586.bin to install Java
    3. Set environment variables:
      1. Set Path: PATH=$PATH:/etc/java/jre1.6.0_18/bin (update it according to path where you installed Java).
      2. Set Path: PATH=$PATH:./
  5. Launch your JMeter
    1. jmeter -n -t my_test.jmx -l log.jtl
    2. You may find full details of this syntax in the Apache Jakarta JMeter page:
      1. -n: nongui mode
      2. -t: the script file you built before
      3. -l: the results file
Last Words:
Finally we a stress lab in the cloud, all that left is writing your stress script, installing your system and start stressing it...

Keep Performing,
Moshe Kaplan

Feb 9, 2010

Blocked Sessions In the Cloud

When you test you new software (or feature) in a new environment (e.g installing your system in cloud environment) you may face errors when you'll try to connect your newly deployed service. What happened?
There are two options:
  1. You did not install correctly your system. You can verify it by connecting the service from within the server (using Terminal Services in Windows case). If it fails you should start exploring the event log and your application log.
  2. Someone is blocking your sessions (probably it is a firewall). You can verify it by running netstat -na from you client command line, and check for SYN_SENT lines in the output. If it's attached to the server IP and port that your service uses, you definitely have firewall in you way. There are several options who is the blocker and how to solve it:
    1. Your computer personal firewall. Probability: Low; Verification: try connecting from another computer.
    2. Your company firewall. Probability: Medium-Low; Verification: try connecting from another computer which is outside your company network (your favorite neighborhood cafe can be great).
    3. Your cloud provider firewall. Probability: High; Verification: if it's Amazon AWS, login to the AWS Management Console, verify the Security Group that your instance is linked to, and verify the Security Group rules.
    4. Your server firewall: Probability: High (if you are using Windows); Verification: check the server firewall configuration that it allows incoming connections in the relevant ports.
Keep Performing
Moshe Kaplan

Feb 7, 2010

PHP Developer? Dance Like You Never Dance Before

Facebook exposed last week its last technology: HipHop for PHP.

Why Should You Need It?
PHP is slow (relatively to Java, .Net/C# and of course to compiled code like C/C++) since it based on interpreter. Faster means more actions using fewer CPU cycles. Fewer CPU cycles mean less servers, less CO2 emission and some say most importantly: more money in the bank.
What Could You Do So Far?
There are several PHP accelerators in the market like Alternative PHP Cache (APC), eAccelerator and Zend Optimizer+. These accelerators optimizes PHP intermediate code, caches data and compiled code from the PHP bytecode compiler (very similar to turning C# into MSIL or JVM into bytecode). 
So What are the News?
There was still a major performance gap between native (unmanaged code) and bytecode. This gap is closed by this new Facebook technology: HipHop transforms the bytecode into native code and gains major performance boost (see the attached image from Facebook Blog).
Last words
You may think that this can be useful only to a large site like Facebook with its 350M users. However, every site with dozens of servers will get major benefits by using this technology: performance bottlenecks reduction, and slashing the number of servers (ya again: money in the bank, CO2 emission and operator time...)

Keep Performing,
Moshe Kaplan


Intense Debate Comments

Ratings and Recommendations