Jan 27, 2015

12 Ways to Boost Your Elasticsearch Performance

The ELK (Elasticsearch, logstash, Kibana) stack is amazing.
In no time you can create a fully functional analytics service from data collection to dashboard presentation.

But what happens at scale? How can make sure this blazing fast solution keeps serving your business team even when your data includes hundreds of millions of data points and more.

What to Focus on?
Elasticsearch performs two major tasks:
  1. Data load and indexing which is CPU intensive.
  2. Search and queries that is Memory intensive.
You should design your system to match you business case pattern.

Step 1: Keep your version up to date
Elasticsearch is a relatively young tool, and the team delivers new features and fixes in a rapid way, so make sure you keep with the latest versions.

Step 2: Tune Your Memory
Elasticsearch memory utilization should be about 50% of your machine. It should be configured using the $ES_HEAP_SIZE environment variable to this number (2G for example): export ES_HEAP_SIZE=2G
Note: Probably this method should not work, as the init.d script overrides it... edit your /etc/init.d/elasticsearch with the ES_HEAP_SIZE=2g parameter.

Step 3: Select Your Storage
Disks are crucial when your data is larger then your memory. Choose local SSD disks. They will cost less and perform better.

Step 4: Stripe Your Data
Use path.data and path.logs to stripe your data and logs on multiple disks to gain more IOPS.

Step 5: Prepare for Index Merging:
Index merging is probably the most frustrating process in Elasticsearch. It's required to keep your system performance in the long run, but can end in relatively short high resource utilization. Elasticsearch protects itself to merge up to 20MB/s. If it serves as your back office system, you can disable the index.store.throttle.type settings to none

Step 6: Plan for Bulk Loading
Like any other data solution, you should data in bulks when possible to fasten your load and minimize resource utilization. This is the reason you should check Bulk API.

Step 7: Optimize Your Index
Run optimize on your index when it is stable (for example after a daily load) to verify best performance

Step 8: Enlarge the File Handler Limit
Like other data solutions, Elasticsearch utilizes a high number of file handlers. Make sure to add the following settings to /etc/security/limits.conf:
*     soft    nofile          64000
*     hard    nofile          64000

Step 9: Make RAM Space for Your Indexes
Elasticsearch is optimized to clusters w/ over 10GB RAM as its default room for indexes is 10% of its memory. Since the best practice is having at least 512MB for the index buffer size, if your system is so large, make sure you add the following configuration to: /etc/elasticsearch/elasticsearch.yml

Step 10: Change Mappings
Elasticsearch by default has some data mapping that may be avoided  in your case to save disk space, memory and boost performance:
  1. The _source field that stores the original data
  2. The _all field combines all fields to a single one for special search for any

Step 11: Add Monitoring
You can either choose Marvel, the ELK management tool with the Kibana look that is part of the Enterprise package or make your own using open source solutions or hosted solutions like New Relic.

Step 12: Sharding
It none working, start sharding and adding nodes to your system.

Bottom Line
Elasticsearch is an amazing tool and with the right configuration it can keep serving your analytics needs even in the scale of billions of events.

Keep Performing,
Moshe Kaplan

Jan 16, 2015

Offloading SSL using AWS ELB

If you are using AWS elastic load balancer to scale your system, you may find that it is a good solution to offload SSL termination from your servers.

Why Should You Offload SSL Termination?
HTTPS is an encrypted protocol, and encryption required high CPU utilization to perform the needed mathematical computations.
Since most web applications are CPU bounded, you should avoid processing SSL at your servers.

Why AWS Elastic Load Balancer (or Any other LB) Is a Great Candidate?
In order to perform load balancing, the load balancer must decrypt the traffic and read its content. This is done by placing your certificate on the load balancer.
If you consider the network between your LB and your servers to be secure, you should prefer to avoid re-encryption of the traffic, and keep it clear.

How Can I Make Sure Traffic is Actually Secured?
In some cases, you want all your users to use HTTPS as an encrypted channel in order to keep your users privacy and avoid eavesdropping and injections.
In these cases you want to catch traffic that did not use HTTPS before being terminated in the LB and redirect it to HTTPS. This can be done by evaluating by the X-Forwarded-Proto server field in your .htaccess or Apache configuration:
RewriteEngine On
RewriteCond %{HTTP:X-Forwarded-Proto} !https [NC]

RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

Bottom Line
A careful design can help you get more out of your web servers

Keep Performing,
Moshe Kaplan


Intense Debate Comments

Ratings and Recommendations