Tags
black swan business caching cdn Collaboration Content Management customer underground daas dbpedia ddu ddu2012 drupal Drupal Uriverse DBpedia Solr facebook gpl hosting location mailchimp mapping mercury mobile modules multisite PayPa performance plugin pressflow progressive enhancement Publishing randomness ratings rdfa responsive web design reviews saas scalability semanticweb Social Subscription themes topicmaps uriverse wikipedia wordle wordpress


Drupal LAMP Server Tuning (part 6)
Getting the most from your Drupal site means getting the most from your server – optimizing the various layers of the the LAMP stack. This includes the filesystem, database, web server, PHP, RAM and CPU. Tuning the LAMP stack is a major subject requiring a lot of study and practice to become proficient. It’s something you will probably never completely master
Try Googling lamp performance tune for a few articles to whet your appetite. For now, we’ll cover a few of the major considerations for Drupal, although most of this advice would apply to any PHP web app running on Linux.
Opcode cache
Opcode caches cache the compiled form of a PHP script in shared memory to avoid the overhead of parsing and compiling the code every time the script runs. This saves RAM and reduces script execution time.
Quite a bit of benchmarking has been done in the Drupal and PHP communities between APC, eAccelerator and XCache. eAccelerator may have the edge in raw performance, but it appears that APC is the preferred opcode cache in the Drupal community because it is well maintained and less buggy.
All sites: faster and less RAM. Moderate install.
Database
There are a number of choices to be made when tuning your MySQL database server. The MySQLTuner script can be helpful for identifying outstanding issues you may be unaware of. It can be run on a functioning production server to see how your database is performing in the wild. It’s possible to take a best guess at config options on your dev machine but you aren’t going to know how things are going to shape up until real users start hitting the DB.
Storage Engine: InnoDB vs MyISAM
A default install of Drupal 6 installs the DB tables as MyISAM. This will change in Drupal 7 with the default set to InnoDB. A Drupal 6 installation may well have some InnoDB tables as modules may create new tables in the InnoBD engine. Your installation may therefore be a mix between the two engines.
In many places on the web you will read statements such as ‘All high performance Drupal sites run InnoDB”. This is not necessarily so as there are some cases where MyISAM may still be preferred although with recent changes to Drupal core the pendulum has swung to InnoDB as a sensible default.
A list of the main difference between the engines is as follows:/p>
In general, you would consider sticking with MyISAM if
InnoDB tables definitely should be used for all of the Drupal cache tables since this is where most contention is likely to occur.
Finally, it must be noted that Drupal was written based on the MyISAM engine and as such many queries were not optimized for InnoDB. The SELECT COUNT(*) is particularly slow in InnoDB because it must scan all rows to calculate the count. Many of these shortcomings have been removed in the PressFlow distribution and have since made their way back into core.
All sites: InnoDB for less contention on cache
Most sites: InnoDB for everything else
Big unchanging sites: MyISAM faster reads less RAM
MySQL Configuration
There are a number of MySQL config variables which must be tweaked to suit your data. It is impossible to specify one set of options to suit all sites. A few rules of thumb are offered below.
Key buffer
If you are running MyISAM tables then the key buffer is a very important variable to set. The key buffer stores table indexes in memory, allowing for fast lookups and joins. For large node, node_version and url_alias tables it is a must to have enough room to fit these tables into memory, otherwise your site will very slow on the most basic of operations: looking up nodes, titles and paths.
One rule of thumb is to set this buffer to somewhere between 25% and 50% of the memory on the server. To determine the best value up front sum the size of all the .MYI files.
MyISAM sites: most queries faster. Essential.
Query cache
MySQL has a query cache which stores results up to a certain size in memory. The cache is very handy for quickly returning commonly accessed data when all other forms of caching (reverse proxies, page cache, Drupal caches) have not been invoked. Queries which may take sometime return almost instantly.
During the development and testing of a site the query cache can catch developers out since a query may appear to be performing quite well the second and subsequent times through. To really test a query you need to fire up mysql client (or phpmyadmin) and add the SQL_NO_CACHE option to the query to see the real time it takes. Don’t be fooled!
The query cache is destroyed if any row in the table is changed and so it cannot be relied upon if tables are changing frequently. The cache shines when the are big tables which don’t change that often. Unless your site has such characteristics it is best to limit it so that it fits small unchanging tables and then some for the most popular queries. Examination of cache hit rates will show you if it needs to be extended or reduced.
All sites: common queries faster
InnoDB Buffer Pool Size
If you are running InnoDB tables then it is essential to optimize the InnoDB Buffer Pool Size, increasing the memory to reduce query time. InnoDB is more memory intensive and so the pool will be larger than that used for MyISAM tables. MySQL documentation suggests that the size can be upped to 80% of physical memory. Anymore could lead to swap issues.
InnoDBsites: most queries faster. Essential.
Other variables
Other variables worth tweaking include the following. See Optimizing the mysqld variables for more.
Database Warmup
A warm database will perform much better than a recently started one because its caches and buffers will be primed with keys and data. It therefore makes sense to warm up a DB every time the database is restarted. The best way to do this is to load in the indexes of commonly used tables. This guide recommends loading in node, node_revisions and url_alias. Taxonomy information could be good candidates as well.
USE drupal6;
LOAD INDEX INTO CACHE node;
LOAD INDEX INTO CACHE node_revisions;
LOAD INDEX INTO CACHE url_alias;
LOAD INDEX INTO CACHE term_data;
LOAD INDEX INTO CACHE term_node;
This SQL code can then be put in a script and run when MySQL restarts. It is possible to configure the
init_filevariable in my.cnf to tell mysql where to find the startup SQL.init-file = /etc/mysql/init-file.sqlMany nodes: Most queries where index relied upon.
LOAD INDEX INTO CACHE t1.init_filevariable to specify startup SQL.Database Indexes
Indexes on columns can dramatically speed up queries if the columns are used for filtering, sorting or joining. Generally, Drupal has most of the indexes you need covered, however, there are some areas where standard tables can benefit from an additional index. It is recommended that you profile your queries to see where things are slow before adding indexes in a scattergun approach because adding indexes can harm performance if they are not being used properly. You can use MySQL’s slow query log for queries with no index to identify areas for improvement.
mikeytown2 has come up with a list of tables which could do with an index:
Web server
Apache + MPM Prefork + mod_php is the default web server configuration in the LAMP stack. This combination does consume large amounts of RAM which can be a problem for handling many requests. It can also be quite heavy and slow for serving static content. Many administrators have looked to replace it with other combinations including multithreaded processes (MPM Worker) and external PHP (mod_fcgid) as well as swapping it out completely for another server such as Nginx. This guide has adopted the position that Apache problems can be ameliorated somewhat by removing unneeded modules, running fcgid to connect with PHP and using MPM Worker to enable multithreading per process. However, in some cases this won’t be enough and Nginx is a must.
Apache vs Nginx
Other Drupal users have replaced Apache with faster more lightweight (RAM and CPU) web servers such as Nginx and Lighttpd. Nginx is generally preferred over Lighttpd because of memory leaks in the latter. It is currently possible to run Nginx without losing any functionality in Drupal. Boost, a module based on .htaccess rules, now supports Nginx so it is feasible to run Nginx as the main web server. If you are constrained by CPU or have high loads then this certainly is an option worth considering.
Setting up Nginx is not trivial but it is reasonably straight forward if you are comfortable with compiling and patching. There are some good tutorials on the Web for user who want to do this.
Low resources, High Traffic, Many logged in: Possible to get more for less with Nginx.
Apache: Unneeded modules
It is possible to turn off unneeded modules in Apache to reduce memory footprint. The modules you require depends very much on your setup.
The traditional way of controlling modules in Apache has been through the LoadModule directive in httpd.conf. Ubuntu and Debian do it differently with the /etc.apache2/mods-available directory and the a2enmod command. To list all modules to enable try:
$ sudo a2enmod
$ sudo /etc/init.d/apache2 force-reload
And to see what you have enabled you can do
$ sudo a2dismod.All sites: Good savings in RAM
Apache threading: MPM Worker (multi threaded) MPM Prefork
The use of MPM Worker allows for the handling of more requests due to multithreading in each process. It has a smaller memory footprint than Prefork and is faster. According to docs, Apache must be compiled with the
--with-mpmargument in order to install Worker as “prefork” is the default on Unix systems.RAM limited: Worker preferable to Prefork.
Connectors: mod_php, FastCGI, mod_fcgid
The use of mod_php with Apache is the most common setup for calling PHP. mod_php works by embedding PHP into every Apache process. This has the disadvantage of a large memory footprint for each Apache process. FastCGI and mod_fcgid overcomes this problem and reduces resource utilization with no gains in performance.
kbahey lists the disadvantages of mod_php as follows:
Use mod_fcgid for lower memory and DB/Network connections
Apache MaxClients
The MaxClients parameter controls how many simultaneous clients Apache is able to serve. If it is set to high RAM will be chewed up and the Machine will go into swap. If it is set to low then your site will be unnecessarily limited by the number of clients it can serve. The setting of this value should be determined after consideration of (i) how much spare RAM is available on the server and (ii) how much RAM each Apache process consumes. Obviously you will want to maximize available RAM through frugal allocation of RAM to MySQL, JVM, etc and minimize the size of Apache process through techniques described above.
2bits provide the following formula:
MaxClients = (Total Memory - Operating System Memory - MySQL memory) / Size Per Apache process.The only addition this guide would make is that it is important to leave some RAM free for the OS file buffer to allow efficient operation of the OS.
.htaccess
If you are running Apache then it is possible to either use .htaccess or the apache conf file to specify directives such as rewrite rules, etc. If you use .htaccess then Apache must look for .htaccess rules in the directory hierarchy for every request. This can take some time even if no rules are found. You may consider putting the rules in httpd.conf/apache2.conf if you are looking to eek out the most performance from your site.
.htaccess can slow down site if performance is crucial.
RAM: A precious resource
Given the above, serious thought should be given to how the RAM on your box is to be divided up. In a nutshell we have the following apps contesting for their fair share:
Consider the following when deciding how to divide up your box:
This article forms part of a series on Drupal performance and scalability. The first article in the series is Squeezing the last drop from Drupal: Performance and Scalability.