Open Source Software Technical Articles

Want the Best of the Wazi Blogs Delivered Directly to your Inbox?

Subscribe to Wazi by Email

Your email:

Connect with Us!

Current Articles | RSS Feed RSS Feed

Varnish improves web performance and security

  
  
  

Varnish is a standalone HTTP accelerator that provides efficient and powerful web caching mechanisms. In addition to performing web caching, Varnish can also act as a web application firewall (WAF) and a load balancer, and it can be integrated with any back-end web server.

Varnish is supported in all popular Linux distributions. In CentOS it is available from the EPEL repository and through the official Varnish repository. The latter is preferred because it's more frequently updated and more likely to have the latest version.

Today, the latest version of Varnish is 3.0. To install the Varnish repository, run the command rpm --nosignature -i http://repo.varnish-cache.org/redhat/varnish-3.0/el5/noarch/varnish-release-3.0-1.noarch.rpm. After that, install Varnish itself with the command yum install varnish. Make sure that the newly installed service starts and stops with the system by adding it to the system runlevels with the command chkconfig varnish on. You should now be all set; in the future you can update the package when you update the rest of your software by running yum update.

Basic configuration

The file /etc/sysconfig/varnish defines the daemon's basic options, such as listening ports and storage options. The most important options are defined by the directive DAEMON_OPTS. You can choose among three predefined options for this variable: minimal configuration, configuration with Varnish Configuration Language (VCL), and advanced configuration. We'll choose the second option, to use VCL, because VCL is what makes Varnish so powerful and customizable, as we'll see shortly.

In the file /etc/sysconfig/varnish, uncomment the second configuration option so that it looks like this:

DAEMON_OPTS="-a :6081 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -u varnish -g varnish \
             -S /etc/varnish/secret \
             -s file,/var/lib/varnish/varnish_storage.bin,1G"

Make sure that anything below this is commented out and thus remains inactive.

The first value above is for the listening port – -a :6081. This means that to use Varnish you must connect your site to port 6081 instead of 80, as in http://example.org:6081. This lets you test Varnish without having to make any changes to your web server. Just don't forget to allow incoming requests to this port in your firewall (/sbin/iptables -I INPUT -p TCP --dport 6081).

Later, once you have everything configured to your liking, you will want Varnish to start intercepting web requests on port 80. Just changing the listening option here is not enough, because your web server is probably listening on port 80 already. To change Apache to listen on a different port, such as 8080, change the option Listen in the file /etc/httpd/conf/httpd.conf.

Sometimes it's not easy or safe to reconfigure your web server's listening port, especially if you have multiple vhosts. In such cases you can turn to iptables for help. The following command redirects all incoming requests on TCP port 80 on the interface eth0 to TCP port 6081: /sbin/iptables -I PREROUTING -t nat -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 6081. iptables can help you integrate your web server with Varnish quickly and seamlessly.

The next configuration option is -T localhost:6082. It instructs Varnish to provide a simple command-line admin interface on the local interface (127.0.0.1) on TCP port 6082. You can connect to it with the command varnishadm.

The option -f /etc/varnish/default.vcl specifies the default VCL file, which we'll get to in a moment.

Varnish works under its own user and group, both named varnish and defined with the options -u varnish -g varnish.

In order to access the admin interface you have to authenticate with a secret passphrase. The option -S /etc/varnish/secret defines the path to the file that contains the secret, which is used by both the admin server and the client.

The last and most important option (-s) is for the cache storage. The default storage is file – that is, cached content is stored on the filesystem in the file /var/lib/varnish/varnish_storage.bin. According to the above configuration, its file size limit is only 1GB. This is a rather conservative configuration option, and because it is file-based will likely provide slow read/write performance capabilities. For better results you can use RAM (referred to as "malloc" in Varnish configuration) to store the cache. Specify the storage option as -s malloc,1G. You can also increase the storage size, taking into account your system's resources.

Save the configuration file and start Varnish for the first time with the command service varnish start.

Advanced configuration

Varnish lets you use VCL to fine-tune its configuration. VCL supports simple programming features such as constants, conditional statements, and subroutines. As per our main configuration, the VCL file is /etc/varnish/default.vcl.

The first definition in the VCL file is the default back end, which is configured with the backend definition, and looks like this:

backend default {
  .host = "127.0.0.1";
  .port = "80";
} 

This configuration assumes Varnish works on the same host as the web server. If this is not the case you can use a remote IP address for the host value.

VCL provides three important predefined subroutines that are responsible for different parts of the communication flow between the client, Varnish, and the backend server(s). vcl_recv handles requests from the client. It lets you alter a request or take certain actions if conditions are met. vcl_fetch is activated after the response from the back end has been received, and allows you to manipulate the response. In doing so you may change whether a response should be cached or not depending on its content. vcl_deliver allows you to make final changes to the content object that goes to the client.

The file /etc/varnish/default.vcl contains many examples of VCL use that are commented out and not active by default, and VCL's documentation contains more information and examples.

Optimizing Varnish and increasing the cache hit rate ratio

Just starting Varnish and using it as a proxy for a back-end web server does not guarantee better performance. It might do little if any caching, because very often web applications themselves are designed to do caching, though they are not always effective. Also, often clients request unique data even though the requests' uniqueness is determined only by a unique cookie. By default, Varnish respects all of the above cases and does little caching.

To see how much caching Varnish does, use the command /usr/bin/varnishstat and check the second and third rows of the output:

Hitrate ratio:       10       21       21
Hitrate avg:     0.9927   0.9889   0.9889

Varnishstat creates the statistics in real time and not from logs. Hitrate ratio shows three periods measured in seconds – 10, 100 and 1,000 seconds. In the above example, it has run for only 21 seconds and that's why it shows this value instead of 100 and 1,000 seconds.

Hitrate avg shows how effective caching was during each period. You should aim to make the hitrate avg value as close as possible to one; if the value is close to zero it means that little or no caching is being done.

If you find your cache hit ratio is low, start investigating the http headers of your web pages. You can use Lynx, the console web browser available in all Linux distributions, to dump the headers with the command lynx --dump --head followed by a URL:

$ lynx -dump -head http://192.168.1.202/test.php
...
X-Varnish: 479098342
Age: 0
Via: 1.1 varnish
...

The header age is the first indicator to look at. As per RFC 2616, 13.2.3 Age Calculations, age is the number of seconds since the origin server generated the response. The first time a new URI is accessed, this number is always zero.

If the age doesn't increase with each subsequent request, then the caching is not working and each request goes to the origin server. When this happens Varnish works as a simple reverse proxy.

Common reasons for Varnish not to serve a cached page are that the request contains cookies, the response contains cookies, or the time to live (TTL) is set to the past, meaning the page has expired.

The best way to resolve the above cases is through your web application – increase TTL and remove unnecessary cookies. In general developers implement cookies much more often than necessary. Under normal circumstances a website should set cookies only for logged-in users to differentiate them.

If you are using a popular web application, it probably already has plugins to improve performance. For example, WordPress has W3 Total Cache, which addresses these issues and works perfectly with Varnish.

If you cannot change the way your application works, you can use Varnish to manage cookies. First, you can use the vcl_fetch subroutine to remove cookies in the response from the back-end server.

On one hand, changing how cookies work is risky and sometimes not feasible for sites like forums and shopping carts. In these cases you want to track and differentiate your visitors all the time.

On the other hand, it's safe to drop all cookies for blogs and business sites where cookies are needed only for the admin back end. As an example, imagine a site that allows logins only for admin users via http://example.org/admin/. In such as case, use a conditional statement with a regular expression match for the request URL (req.url), such as:

sub vcl_fetch {
    if ( !( req.url ~ "^/admin/") ) {
        unset beresp.http.set-cookie;
    }
}

This code does not override the default logic in the vcl_fetch predefined routine, but rather appends yours. Reload Varnish (service varnish reload) for it to take effect.

Usually the above statement is enough to ensure that all requests for a given URL return exactly the same result and thus are cached by Varnish. After making the change, check the age header of the URL again to make sure it increases for subsequent requests.

If the age value still does not increase, you can try changing the TTL. In the vcl_fetch subroutine add a new row set beresp.ttl = 1d; to make the TTL one day since the first unique request.

Usually removing the cookies and manually setting a TTL does the trick and makes Varnish start caching. For more options and advice, check the complete documentation about increasing your hit rate.

Varnish as a load balancer for high availability

Varnish can work with multiple back-end servers that are united in one governing entity called a director. Defining a back end for balancing is similar to defining a standalone back end. The difference comes from the probe options, for back-end health probing, which are important for load balancing. Here is an example from the file /etc/varnish/default.vcl:

backend server1 {
  .host = "192.168.1.101";
  .port = "80";
  .probe = {
                .url = "/index.php";
                .interval = 30s;
                .timeout = 10s;
                .window = 5;
                .threshold = 5;
  }
}

The probe configuration has the following options:

  • url is a simple address on the back end. If you need a more complex check, you can use request along with a URL, and specify custom request headers.
  • interval is the time between each probe. Its value should be higher than the value of the next option, timeout. The more critical downtime is, the lower the value for this option.
  • timeout is the window in which the request has to succeed. TO accommodate slow applications, start with a high value.
  • threshold is the number of last successful probes for a back-end server to be declared healthy.

Using this example as a model, add the rest of your back-end servers to the configuration file /etc/varnish/default.vcl. The next step is to configure the director, which does the load balancing:

director mybalancer client {
        {
                .backend = server1;
                .weight = 1;
        }
        {
                .backend = server2;
                .weight = 1;
        }
}

In the above example, we start by first naming the balancer mybalancer. Next, the client option determines how balancing is done; each request from a single client will be sent to the same back-end server. This can be important when clients go through a sequential process and you want them to start and finish on the same back-end server. By default, IP addresses are used to differentiate clients.

Besides client-based balancing, you can also specify random and round-robin. The only difference is that you shouldn't specify a weight factor for round-robin balancing. Weight determines the priority for using a back end. Back ends receive requests in proportion to their weight factors.

Once the director is configured, the last step is to instruct Varnish to use it with some code in the vcl_recv subroutine. For the current example, the configuration looks like this:

sub vcl_recv {
   set req.backend = mybalancer;
   set client.identity = client.ip;
}

Varnish as a simple WAF for improved security

In addition to its utility for caching and load-balancing, Varnish can be used to hide information about the origin server and thus prevent attackers from using targeted attacks. More specifically, with Varnish you can hide or override any header from the origin server. For example, by default PHP discloses its version in a header statement such as X-Powered-By: PHP/5.4.4-10. To hide this header, edit the file /etc/varnish/default.vcl, uncomment the configuration part for vcl_deliver, and make it look like:

sub vcl_deliver {
    remove resp.http.X-Powered-By;
    return (deliver);
}

You can also change headers such as the web server name and its version. By default, Apache sends a Server header such as Server: Apache/2.2.15 (CentOS). You could change it to look like the server is nginx if you were feeling mischievous by editing the configuration for vcl_fetch and adding:

sub vcl_fetch {
     unset beresp.http.Server;
     set beresp.http.Server = "nginx";
     return (deliver);
}

With VCL you can filter any part of the client's request and server's response, turning Varnish into a powerful web application firewall. For example, if you want to filter GET requests containing PHP's passthru function to block certain attacks, edit vcl_recv and add the following condition:

sub vcl_recv {
    if (req.url ~ "passthru\(") {
         error 404 "No such file";
    }
    return (lookup);
}

Once that's in effect, Varnish will send 404 errors to any client trying to specify the passthru function in the GET request.

Varnish can do more still to protect your site. In fact, entire projects have been dedicated to this, such as Security VCL.

Still, Varnish is not designed to work as a web application firewall, and that's why it lacks important functionality such as POST request inspection. If you are looking for a complete WAF solution, read Protect And Audit Your Web Server With ModSecurity.

I hope you can see why Varnish is so popular. It's widely considered as the best web caching proxy nowadays. And now you should be able to deploy it yourself and improve the performance, security, and availability of your site.




This work is licensed under a Creative Commons Attribution 3.0 Unported License
Creative Commons License.

Comments

I routinely set the following, just for fun: 
 
set beresp.http.Server = "IIS 2.0"; 
Posted @ Thursday, February 14, 2013 10:22 AM by Ben
Not convinced by the discussion of Varnish "improving" security. Disabling identification headers (including php) is Security 101. Adding complexity to avoid a problem is never a good idea and only a tiny, TINY fraction of real attacks are targetted to use this information. Further, as Varnish uses light weight processes (threads) it provides negligible protection against sloloris type attacks (compared with an event based server such as nginx, ATS or lighttpd). VCL does allow for complex scripting of request handling - but it's very difficult to take advantage of this without having a compiler tool chain installed on the device - which is a really bad idea from a security viewpoint. Varnish supports ESI - properly set up, this offers HUGE performance benefits for most websites - but doesn't even get a mention here?
Posted @ Friday, February 15, 2013 9:23 AM by symcbean
@Anatoliy Dimitrov  
* Security.VCL is mostly deprecated. For a full-blown WAF try Varnish Security Firewall: https://github.com/comotion/VSF 
 
 
@symcbean 
* Varnish does protect you against slowloris and has for a long time. 
* Talking about a compiler as a security issue makes no sense. And if you must it is indeed possible to run Varnish without a compiler, although I would advise against doing so. 
 
Posted @ Monday, February 18, 2013 9:22 AM by Rubén Romero
Post Comment
Name
 *
Email
 *
Website (optional)
Comment
 *

Allowed tags: <a> link, <b> bold, <i> italics