Linux WWW HOWTO <author>by Wayne Leister, <tt> <htmlurl url="mailto:n3mtr@qis.net" name="n3mtr@qis.net"></tt> <date>v0.82, 19 November 1997 <abstract> This document contains information about setting up WWW services under Linux (both server and client). It tries not to be a in detail manual but an overview and a good pointer to further information. <p> <bf>Archived Document Notice:</bf> This document has been archived by the LDP because it is severely out-of-date. If you are interested in maintaining this document, contact <htmlurl url="http://tldp.org" name="The Linux Documentation Project">. </p> </abstract> <toc> <!-- Introduction SECTION ================================================== --> <sect>Introduction <p> Many people are trying Linux because they are looking for a really good <em>Internet capable</em> operating system. Also, there are institutes, universities, non-profits, and small businesses which want to set up Internet sites on a small budget. This is where the WWW-HOWTO comes in. This document explains how to set up clients and servers for the largest part of the Internet - <em>The World Wide Web</em>. All prices in this document are stated in US dollars. This document assumes you are running Linux on an Intel platform. Instructions and product availability my vary from platform to platform. There are many links for downloading software in this document. Whenever possible use a mirror site for faster downloading and to keep the load down on the main server. The US government forbids US companies from exporting encryption stronger than 40 bit in strength. Therefore US companies will usually have two versions of software. The import version will usually support 128 bit, and the export only 40 bit. This applies to web browsers and servers supporting secure transactions. Another name for secure transactions is Secure Sockets Layer (SSL). We will refer to it as SSL for the rest of this document. <sect1>Copyright <p> This document is Copyright (c) 1997 by Wayne Leister. The original author of this document was Peter Dreuw.(All versions prior to 0.8) <quote> This HOWTO is free documentation; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.</quote> <quote> This document is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details.</quote> <quote> You can obtain a copy of the GNU General Public License by writing to the Free Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. </quote> Trademarks are owned by there respective owners. <sect1>Feedback <p> Any feedback is welcome. I do not claim to be an expert. Some of this information was taken from badly written web sites; there are bound to be errors and omissions. But make sure you have the latest version before you send corrections; It may be fixed in the next version (see the next section for where to get the latest version). Send feedback to <htmlurl url="mailto:n3mtr@qis.net" name="n3mtr@qis.net">. <sect1>New versions of this Document <p> New versions of this document can be retrieved in text format from Sunsite at <url url="http://sunsite.unc.edu/pub/Linux/docs/HOWTO/WWW-HOWTO"> and almost any Linux mirror site. You can view the latest HTML version on the web at <url url="http://sunsite.unc.edu/LDP/HOWTO/WWW-HOWTO.html">. There are also HTML versions available on Sunsite in a tar archive. <!-- WWW = CLIENT SECTION =========================================== --> <sect>Setting up WWW client software <p> The following chapter is dedicated to the setting up web browsers. Please feel free to contact me, if your favorite web browser is not mentioned here. In this version of the document only a few of the browsers have there own section, but I tried to include all of them (all I could find) in the overview section. In the future those browsers that deserve there own section will have it. The overview section is designed to help you decide which browser to use, and give you basic information on each browser. The detail section is designed to help you install, configure, and maintain the browser. Personally, I prefer the Netscape; it is the only browser that keeps up with the latest things in HTML. For example, Frames, Java, Javascript, style sheets, secure transactions, and layers. Nothing is worse than trying to visit a web site and finding out that you can't view it because your browser doesn't support some new feature. However I use Lynx when I don't feel like firing up the X-windows/Netscape monster. <sect1>Overview <p> <descrip> <tag><ref id="netscape" name="Navigator/Communicator"></tag> Netscape Navigator is the only browser mentioned here, which is capable of advanced HTML features. Some of these features are frames, Java, Javascript, automatic update, and layers. It also has news and mail capability. But it is a resource hog; it takes up lots of CPU time and memory. It also sets up a separate cache for each user wasting disk space. Netscape is a commercial product. Companies have a 30 day trial period, but there is no limit for individuals. I would encourage you to register anyway to support Netscape in there efforts against Microsoft (and what is a measly $40US). My guess is if Microsoft wins, we will be forced to use MS Internet Explorer on a Windows platform :( <tag><ref id="lynx" name="Lynx"></tag> Lynx is the one of the smallest web browsers. It is the king of text based browsers. It's free and the source code is available under the GNU public license. It's text based, but it has many special features. <tag/Kfm/ Kfm is part of the K Desktop Environment (KDE). KDE is a system that runs on top of X-windows. It gives you many features like drag an drop, sounds, a trashcan and a unified look and feel. Kfm is the K File Manager, but it is also a web browser. Don't be fooled by the name, for a young product it is very usable as a web browser. It already supports frames, tables, ftp downloads, looking into tar files, and more. The current version of Kfm is 1.39, and it's free. Kfm can be used without KDE, but you still need the librarys that come with KDE. For more information about KDE and Kfm visit the KDE website at <url url="http://www.kde.org">. <tag><ref id="emacs" name="Emacs"></tag> Emacs is the one program that does everything. It is a word processor, news reader, mail reader, and web browser. It has a steep learning curve at first, because you have to learn what all the keys do. The X-windows version is easier to use, because most of the functions are on menus. Another drawback is that it's mostly text based. (It can display graphics if you are running it under X-windows). It is also free, and the source code is available under the GNU public license. <tag/NCSA Mosaic/ Mosaic is an X-windows browser developed by the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NCSA spent four years on the project and has now moved on to other things. The latest version is 2.6 which was released on July 7, 1995. Source code is available for non-commercial use. <url url="http://www.spyglass.com" name="Spyglass Inc."> has the commercial rights to Mosaic. Its a solid X-windows browser, but it lacks the new HTML features. For more info visit the NCSA Mosaic home page at <url url="http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/">. The software can be downloaded from <url url="ftp://ftp.ncsa.uiuc.edu/Mosaic/Unix/binaries/2.6/Mosaic-linux-2.6.Z">. <tag/Arena/ Arena was a X-windows concept browser for the W3C (World Wide Web Consortium) when they were testing HTML 3.0. Hence it supports all the HTML 3.0 standards such as style sheets and tables. Development was taken over by Yggdrasil Computing, with the idea to turn it into a full fledge free X-windows browser. However development has stopped in Feb 1997 with version 0.3.11. Only part of the HTML 3.2 standard has been implemented. The source code is released under the GNU public licence. For more information see the web site at <url url="http://www.yggdrasil.com/Products/Arena/">. It can be downloaded from <url url="ftp://ftp.yggdrasil.com/pub/dist/web/arena/">. <tag/Amaya/ Amaya is the X-windows concept browser for the W3C for HTML 3.2. Therefore it supports all the HTML 3.2 standards. It also supports some of the features of HTML 4.0. It supports tables, forms, client side image maps, put publishing, gifs, jpegs, and png graphics. It is both a browser and authoring tool. The latest public release is 1.0 beta. Version 1.1 beta is in internal testing and is due out soon. For more information visit the Amaya web site at <url url="http://www.w3.org/Amaya/">. It can be downloaded from <url url="ftp://ftp.w3.org/pub/Amaya-LINUX-ELF-1.0b.tar.gz">. <tag/Red Baron/ Red Baron is an X-windows browser made by Red Hat Software. It is bundled with The Official Red Hat Linux distribution. I could not find much information on it, but I know it supports frames, forms and SSL. If you use Red Baron, please help me fill in this section. For more information visit the Red Hat website at <url url="http://www.redhat.com"> <tag/Chimera/ Chimera is a basic X-windows browser. It supports some of the features of HTML 3.2. The latest release is 2.0 alpha 6 released August 27, 1997. For more information visit the Chimera website at <url url="http://www.unlv.edu/chimera/">. Chimera can be downloaded from <url url="ftp://ftp.cs.unlv.edu/pub/chimera-alpha/chimera-2.0a6.tar.gz">. <tag/Qweb/ Qweb is yet another basic X-windows browser. It supports tables, forms, and server site image maps. The latest version is 1.3. For more information visit the Qweb website at <url url="http://sunsite.auc.dk/qweb/"> The source is available from <url url="http://sunsite.auc.dk/qweb/qweb-1.3.tar.gz"> The binaries are available in a Red Hat RPM from <url url="http://sunsite.auc.dk/qweb/qweb-1.3-1.i386.rpm"> <tag/Grail/ Grail is an X-windows browser developed by the Corporation for National Research Initiatives (CNRI). Grail is written entirely in Python, a interpreted object-oriented language. The latest version is 0.3 released on May 7, 1997. It supports forms, bookmarks, history, frames, tables, and many HTML 3.2 things. <tag/Internet Explorer/ There are rumors, that Microsoft is going to port the Internet Explorer to various Unix platforms - maybe Linux. If its true they are taking their time doing it. If you know something more reliable, please drop me an e-mail. </descrip> In my humble opinion most of the above software is unusable for serious web browsing. I'm not trying to discredit the authors, I know they worked very hard on these projects. Just think, if all of these people had worked together on one project, maybe we would have a free browser that would rival Netscape and Internet Explorer. In my opinion out of all of the broswers, Netscape and Lynx are the best. The runners up would be Kfm, Emacs-W3 and Mosaic. <!-- Lynx =================================== --> <sect>Lynx<label id="lynx"> <p> Lynx is one of the smaller (around 600 K executable) and faster web browsers available. It does not eat up much bandwidth nor system resources as it only deals with text displays. It can display on any console, terminal or xterm. You will not need an <em>X Windows system</em> or additional system memory to run this little browser. <sect1>Where to get <p> Both the Red Hat and Slackware distributions have Lynx in them. Therefore I will not bore you with the details of compiling and installing Lynx. The latest version is 2.7.1 and can be retrieved from <url url="http://www.slcc.edu/lynx/fote/"> or from almost any friendly Linux FTP server like <htmlurl url="ftp://sunsite.unc.edu/pub/Linux/apps/www/browsers/" name="ftp://sunsite.unc.edu under /pub/Linux/apps/www/broswers/"> or mirror site. For more information on Lynx try these locations: <descrip> <tag/Lynx Links/ <url url="http://www.crl.com/~subir/lynx.html"> <tag/Lynx Pages/ <url url="http://lynx.browser.org"> <tag/Lynx Help Pages/ <url url="http://www.crl.com/~subir/lynx/lynx_help/lynx_help_main.html"> (the same pages you get from lynx --help and typing ? in lynx) </descrip> Note: The Lynx help pages have recently moved. If you have an older version of Lynx, you will need to change your lynx.cfg (in /usr/lib) to point to the new address(above). I think the most special feature of Lynx against all other web browsers is the capability for batch mode retrieval. One can write a shell script which retrieves a document, file or anything like that via <em/http/, <em/FTP/, <em/gopher/, <em/WAIS/, <em/NNTP/ or <em>file://</em> - url's and save it to disk. Furthermore, one can fill in data into HTML forms in batch mode by simply redirecting the standard input and using the <em/-post_data/ option. For more special features of Lynx just look at the help files and the man pages. If you use a special feature of Lynx that you would like to see added to this document, let me know. <!-- Emacs W3 ================================= --> <sect>Emacs-W3<label id="emacs"> <p> There are several different flavors of Emacs. The two most popular are GNU Emacs and XEmacs. GNU Emacs is put out by the Free Software Foundation, and is the original Emacs. It is mainly geared toward text based terminals, but it does run in X-Windows. XEmacs (formerly Lucid Emacs) is a version that only runs on X-Windows. It has many special features that are X-Windows related (better menus etc). <sect1>Where to get <p> Both the Red Hat and Slackware distributions include GNU Emacs. The most recent GNU emacs is 19.34. It doesn't seem to have a web site. The FTP site is at <url url="ftp://ftp.gnu.ai.mit.edu/pub/gnu/">. The latest version of XEmacs is 20.2. The XEmacs FTP site is at <url url="ftp://ftp.xemacs.org/pub/xemacs">. For more information about XEmacs goto see its web page at <url url="http://www.xemacs.org">. Both are available from the Linux archives at <htmlurl url="ftp://sunsite.unc.edu/pub/Linux/apps/editors/emacs/" name="ftp://sunsite.unc.edu under /pub/Linux/apps/editors/emacs/"> If you got GNU Emacs or XEmacs installed, you probably got the W3 browser running to. The Emacs W3 mode is a nearly fully featured web browser system written in the Emacs Lisp system. It mostly deals with text, but can display graphics, too - at least - if you run the emacs under the X Window system. To get XEmacs in to W3 mode, goto the apps menu and select browse the web. I don't use Emacs, so if someone will explain how to get it into the W3 mode I'll add it to this document. Most of this information was from the original author. If any information is incorrect, please let me know. Also let me know if you think anything else should be added about Emacs. <!-- Netscape Navigator/Communicator ======================= --> <sect>Netscape Navigator/Communicator<label id="netscape"> <p> <sect1>Different versions and options. <p> Netscape Navigator is the King of WWW browsers. Netscape Navigator can do almost everything. But on the other hand, it is one of the most memory hungry and resource eating program I've ever seen. There are 3 different versions of the program: Netscape Navigator includes the web browser, netcaster (push client) and a basic mail program. Netscape Communicator includes the web browser, a web editor, an advanced mail program, a news reader, netcaster (push client), and a group conference utility. Netscape Communicator Pro includes everything Communicator has plus a group calendar, IBM terminal emulation, and remote administration features (administrators can update thousands of copies of Netscape from their desk). In addition to the three versions there are two other options you must pick. The first is full install or base install. The full install includes everything. The base install includes enough to get you started. You can download the additional components as you need them (such as multimedia support and netcaster). These components can be installed by the Netscape smart update utility (after installing goto help->software updates). At this time the full install is not available for Linux. The second option is import or export. If you are from the US are Canada you have the option of selecting the import version. This gives you the stronger 128 bit encryption for secure transactions (SSL). The export version only has 40 bit encryption, and is the only version allowed outside the US and Canada. The latest version of the Netscape Navigator/Communicator/Communicator Pro is 4.03. There are two different versions for Linux. One is for the old 1.2 series kernels and one for the new 2.0 kernels. If you don't have a 2.0 kernel I suggest you upgrade; there are many improvements in the new kernel. Beta versions are also available. If you try a beta version, they usually expire in a month or so! <sect1>Where to get <p> The best way to get Netscape software is to go through their web site at <url url="http://www.netscape.com/download/">. They have menu's to guide you through the selection. When it ask for the Linux version, it is referring to the kernel (most people should be using 2.0 by now). If your not sure which version kernel you have run 'cat /proc/version'. Going through the web site is the only way to get the import versions. If you want an export version you can download them directly from the Netscape FTP servers. The FTP servers are also more up to date. For example when I first wrote this the web interface did not have the non-beta 4.03 for Linux yet, but it was on the FTP site. Here are the links to the export Linux 2.0 versions: Netscape Navigator 4.03 is at <url url="ftp://ftp.netscape.com/pub/communicator/4.03/shipping/english/unix/linux20/navigator_standalone/navigator-v403-export.x86-unknown-linux2.0.tar.gz"> Netscape Communicator 4.03 for Linux 2.0 (kernel) is at <url url="ftp://ftp.netscape.com/pub/communicator/4.03/shipping/english/unix/linux20/base_install/communicator-v403-export.x86-unknown-linux2.0.tar.gz"> Communicator Pro 4.03 for Linux was not available at the time I wrote this. These url's will change as new versions come out. If these links break you can find them by fishing around at the FTP site <url url="ftp://ftp.netscape.com/pub/communicator/">. These servers are heavily loaded at times. Its best to wait for off peak hours or select a mirror site. Be prepared to wait, these archives are large. Navigator is almost 8megs, and Communicator base install is 10megs. <sect1>Installing <p> This section explains how to install version 4 of Netscape Navigator, Communicator, and Communicator Pro. First unpack the archive to a temporary directory. Then run the <tt/ns-install/ script (type <tt>./ns-install</tt>). Then make a symbolic link from the <tt>/usr/local/netscape/netscape</tt> binary to <tt>/usr/local/bin/netscape</tt> (type <tt>ln -s /usr/local/netscape/netscape /usr/local/bin/netscape</tt>). Finally set the system wide environment variable <tt>$MOZILLA_HOME</tt> to <tt>/usr/local/netscape</tt> so Netscape can find its files. If you are using bash for your shell edit your <tt>/etc/profile</tt> and add the lines: <tscreen><verb> MOZILLA_HOME="/usr/local/netscape" export MOZILLA_HOME </verb></tscreen> After you have it installed the software can automatically update itself with smart update. Just run Netscape as root and goto help->software updates. If you only got the base install, you can also install the Netscape components from there. Note: This will not remove any old versions of Netscape, you must manually remove them by deleting the Netscape binary and Java class file (for version 3). <!-- WWW - SERVER SECTION ======================================== --> <sect>Setting up WWW server systems <p> This section contains information on different http server software packages and additional server side tools like script languages for CGI programs etc. There are several dozen web servers, I only covered those that are fully functional. As some of these are commercial products, I have no way of trying them. Most of the information in the overview section was pieced together from various web sites. If there is any incorrect or missing information please let me know. For a technical description on the http mechanism, take a look at the RFC documents mentioned in the chapter "For further reading" of this HOWTO. I prefer to use the Apache server. It has almost all the features you would ever need and its free! I will admit that this section is heavily biased toward Apache. I decided to concentrate my efforts on the Apache section rather than spread it out over all the web servers. I may cover other web servers in the future. <!-- Overview =============================== --> <sect1>Overview <p> <descrip> <tag/Cern httpd/ This was the first web server. It was developed by the European Laboratory for Particle Physics (CERN). CERN httpd is no longer supported. The CERN httpd server is reported to have some ugly bugs, to be quite slow and resource hungry. The latest version is 3.0. For more information visit the CERN httpd home page at <url url="http://www.w3.org/Daemon/Status.html">. It is available for download at <url url="ftp://sunsite.unc.edu/pub/Linux/apps/www/servers/httpd-3.0.term.tpz"> (no it is not a typo, the extension is actually .tpz on the site; probably should be .tgz) <tag/NCSA HTTPd/ The NCSA HTTPd server is the father to Apache (The development split into two different servers). Therefore the setup files are very similar. NCSA HTTPd is free and the source code is available. This server not covered in this document, although reading the Apache section may give you some help. The NCSA server was once popular, but most people are replacing it with Apache. Apache is a drop in replacement for the NCSA server(same configuration files), and it fixes several shortcomings of the NCSA server. NCSA HTTPd accounts for 4.9% (and falling) of all web servers. (source September 1997 <url url="http://www.netcraft.com/survey/" name="Netcraft survey">). The latest version is 1.5.2a. For more information see the NCSA website at <url url="http://hoohoo.ncsa.uiuc.edu">. <tag><ref id="apache" name="Apache"></tag> Apache is the king of all web servers. Apache and its source code is free. Apache is modular, therefore it is easy to add features. Apache is very flexible and has many, many features. Apache and its derivatives makes up 44% of all web domains (50% if you count all the derivatives). There are over 695,000 Apache servers in operation (source November 1997 <url url="http://www.netcraft.com/survey/" name="Netcraft survey">). The official Apache is missing SSL, but there are two derivatives that fill the gap. Stronghold is a commercial product that is based on Apache. It retails for $995; an economy version is available for $495 (based on an old version of Apache). Stronghold is the number two secure server behind Netscape (source <url url="http://www.c2.net/products/stronghold" name="C2 net"> and <url url="http://www.netcraft.com/survey/" name="Netcraft survey">). For more information visit the Stronghold website at <url url="http://www.c2.net/products/stronghold/">. It was developed outside the US, so it is available with 128 bit SSL everywhere. Apache-SSL is a free implementation of SSL, but it is not for commercial use in the US (RSA has US patents on SSL technology). It can be used for non-commercial use in the US if you link with the free RSAREF library. For more information see the website at <url url="http://www.algroup.co.uk/Apache-SSL/">. <tag/Netscape Fast Track Server/ Fast Track was developed by Netscape, but the Linux version is put out by Caldera. The Caldera site lists it as Fast Track for OpenLinux. I'm not sure if it only runs on Caldera OpenLinux or if any Linux distribution will do (E-mail me if you have the answer). Netscape servers account for 11.5% (and falling) of all web servers (source September 1997 <url url="http://www.netcraft.com/survey/">). The server sells for $295. It is also included with the Caldera OpenLinux Standard distribution which sells for $399 ($199.50 educational). The web pages tell of a nice administration interface and a quick 10 minute setup. The server has support for 40-bit SSL. To get the full 128-bit SSL you need Netscape Enterprise Server. Unfortunately that is not available for Linux :( The latest version available for Linux is 2.0 (Version 3 is in beta, but its not available for Linux yet). To buy a copy goto the Caldera web site at <url url="http://www.caldera.com/products/netscape/netscape.html"> For more information goto the Fast Track page at <url url="http://www.netscape.com/comprod/server_central/product/fast_track/"> <tag/WN/ WN has many features that make it attractive. First it is smaller than the CERN, NCSA HTTPd, an Apache servers. It also has many built-in features that would require CGI's. For example site searches, enhanced server side includes. It can also decompress/compress files on the fly with its filter feature. It also has the ability to retrieve only part of a file with its ranges feature. It is released under the GNU public license. The current version is 1.18.3. For more information see the WN website at <url url="http://hopf.math.nwu.edu/">. <tag/AOLserver/ AOLserver is made by America Online. I'll admit that I was surprised by the features of a web server coming from AOL. In addition to the standard features it supports database connectivity. Pages can query a database by Structured Query Language (SQL) commands. The database is access through Open Database Connectivity (ODBC). It also has built-in search engine and TCL scripting. If that is not enough you can add your own modules through the c Application Programming Interface (API). I almost forgot to mention support for 40 bit SSL. And you get all this for free! For more information visit the AOLserver site at <url url="http://www.aolserver.com/server/"> <tag/Zeus Server/ Zeus Server was developed by Zeus Technology. They claim that they are the fastest web server (using WebSpec96 benchmark). The server can be configured and controlled from a web browser! It can limit processor and memory resources for CGI's, and it executes them in a secure environment (whatever that means). It also supports unlimited virtual servers. It sells for $999 for the standard version. If you want the secure server (SSL) the price jumps to $1699. They are based outside the US so 128 bit SSL is available everywhere. For more information visit the Zeus Technology website at <url url="http://www.zeus.co.uk">. The US website is at <url url="http://www.zeus.com">. I'll warn you they are cocky about the fastest web server thing. But they don't even show up under top web servers in the Netcraft Surveys. <tag/CL-HTTP/ CL-HTTP stands for Common Lisp Hypermedia Server. If you are a Lisp programmer this server is for you. You can write your CGI scripts in Lisp. It has a web based setup function. It also supports all the standard server features. CL-HTTP is free and the source code is available. For more information visit the CL-HTTP website at <url url="http://www.ai.mit.edu/projects/iiip/doc/cl-http/home-page.html"> (could they make that url any longer?). </descrip> If you have a commercial purpose (company web site, or ISP), I would strongly recommend that you use Apache. If you are looking for easy setup at the expense of advanced features then the Zeus Server wins hands down. I've also heard that the Netscape Server is easy to setup. If you have an internal use you can be a bit more flexible. But unless one of them has a feature that you just have to use, I would still recommend using one of the three above. This is only a partial listing of all the servers available. For a more complete list visit Netcraft at <url url="http://www.netcraft.com/survey/servers.html"> or Web Compare at <url url="http://webcompare.internet.com">. <!-- Apache httpd ============================= --> <sect>Apache<label id="apache"> <p> The current version of Apache is 1.2.4. Version 1.3 is in beta testing. The main Apache site is at <url url="http://www.apache.org/">. Another good source of information is Apacheweek at <url url="http://www.apacheweek.com/">. The Apache documentation is ok, so I'm not going to go into detail in setting up apache. The documentation is on the website and is included with the source (in HTML format). There are also text files included with the source, but the HTML version is better. The documentation should get a whole lot better once the Apache Documentation Project gets under way. Right now most of the documents are written by the developers. Not to discredit the developers, but they are a little hard to understand if you don't know the terminology. <sect1>Where to get <p> Apache is included in the Red Hat, Slackware, and OpenLinux distributions. Although they may not be the latest version, they are very reliable binaries. The bad news is you will have to live with their directory choices (which are totally different from each other and the Apache defaults). The source is available from the Apache web site at <url url="http://www.apache.org/dist/"> Binaries are are also available at apache at the same place. You can also get binaries from sunsite at <url url="ftp://sunsite.unc.edu/pub/Linux/apps/www/servers/">. And for those of us running Red Hat the latest binary RPM file can usually be found in the contrib directory at <url url="ftp://ftp.redhat.com/pub/contrib/i386/"> If your server is going to be used for commercial purposes, it is highly recommended that you get the source from the Apache website and compile it yourself. The other option is to use a binary that comes with a major distribution. For example Slackware, Red Hat, or OpenLinux distributions. The main reason for this is security. An unknown binary could have a back door for hackers, or an unstable patch that could crash your system. This also gives you more control over what modules are compiled in, and allows you to set the default directories. It's not that difficult to compile Apache, and besides you not a real Linux user until you compile your own programs ;) <sect1>Compiling and Installing <p> First untar the archive to a temporary directory. Next change to the src directory. Then edit the Configuration file if you want to include any special modules. The most commonly used modules are already included. There is no need to change the rules or makefile stuff for Linux. Next run the Configure shell script (<tt>./Configure</tt>). Make sure it says Linux platform and gcc as the compiler. Next you may want to edit the httpd.h file to change the default directories. The server home (where the config files are kept) default is <tt>/usr/local/etc/httpd/</tt>, but you may want to change it to just <tt>/etc/httpd/</tt>. And the server root (where the HTML pages are served from) default is <tt>/usr/local/etc/httpd/htdocs/</tt>, but I like the directory <tt>/home/httpd/html</tt> (the Red Hat default for Apache). If you are going to be using su-exec (see special features below) you may want to change that directory too. The server root can also be changed from the config files too. But it is also good to compile it in, just encase Apache can't find or read the config file. Everything else should be changed from the config files. Finally run make to compile Apache. If you run in to problems with include files missing, check the following things. Make sure you have the kernel headers (include files) installed for your kernel version. Also make sure you have these symbolic links in place: <tscreen><verb> /usr/include/linux should be a link to /usr/src/linux/include/linux /usr/include/asm should be a link to /usr/src/linux/include/asm /usr/src/linux should be a link to the Linux source directory (ex.linux-2.0.30) </verb></tscreen> Links can be made with <tt>ln -s</tt>, it works just like the cp command except it makes a link (<tt>ln -s source-dir destination-link</tt>) When make is finished there should be an executable named httpd in the directory. This needs to be moved in to a bin directory. <tt>/usr/sbin</tt> or <tt>/usr/local/sbin</tt> would be good choices. Copy the conf, logs, and icons sub-directories from the source to the server home directory. Next rename 3 of the files files in the conf sub-directory to get rid of the <tt>-dist</tt> extension (ex. <tt>httpd.conf-dist</tt> becomes <tt/httpd.conf/) There are also several support programs that are included with Apache. They are in the <tt/support/ directory and must be compiled and installed separately. Most of them can be make by using the makefile in that directory (which is made when you run the main <tt/Configure/ script). You don't need any of them to run Apache, but some of them make the administrators job easier. <sect1>Configuring <p> Now you should have four files in your <tt/conf/ sub-directory (under your server home directory). The <tt/httpd.conf/ sets up the server daemon (port number, user, etc). The <tt/srm.conf/ sets the root document tree, special handlers, etc. The <tt/access.conf/ sets the base case for access. Finally <tt/mime.types/ tells the server what mime type to send to the browser for each extension. The configuration files are pretty much self-documented (plenty of comments), as long as you understand the lingo. You should read through them thoroughly before putting your server to work. Each configuration item is covered in the Apache documentation. The <tt/mime.types/ file is not really a configuration file. It is used by the server to translate file extensions into mime-types to send to the browser. Most of the common mime-types are already in the file. Most people should not need to edit this file. As time goes on, more mime types will be added to support new programs. The best thing to do is get a new mime-types file (and maybe a new version of the server) at that time. Always remember when you change the configuration files you need to restart Apache or send it the SIGHUP signal with <tt/kill/ for the changes to take effect. Make sure you send the signal to the parent process and not any of the child processes. The parent usually has the lowest process id number. The process id of the parent is also in the <tt/httpd.pid/ file in the log directory. If you accidently send it to one of the child processes the child will die and the parent will restart it. I will not be walking you through the steps of configuring Apache. Instead I will deal with specific issues, choices to be made, and special features. I highly recommend that all users read through the security tips in the Apache documentation. It is also available from the Apache website at <url url="http://www.apache.org/docs/mics/security_tips.html">. <sect1>Hosting virtual websites <p> Virtual Hosting is when one computer has more than one domain name. The old way was to have each virtual host have its own IP address. The new way uses only one IP address, but it doesn't work correctly with browsers that don't support HTTP 1.1. My recommendation for businesses is to go with the IP based virtual hosting until most people have browsers that support HTTP 1.1 (give it a year or two). This also gives you a more complete illusion of virtual hosting. While both methods can give you virtual mail capabilities (can someone confirm this?), only IP based virtual hosting can also give you virtual FTP as well. If it is for a club or personal page, you may want to consider shared IP virtual hosting. It should be cheaper than IP based hosting and you will be saving precious IP addresses. You can also mix and match IP and shared IP virtual hosts on the same server. For more information on virtual hosting visit Apacheweek at <url url="http://www.apacheweek.com/features/vhost">. <sect2>IP based virtual hosting <p> In this method each virtual host has its own IP address. By determining the IP address that the request was sent to, Apache and other programs can tell what domain to serve. This is an incredible waste of IP space. Take for example the servers where my virtual domain is kept. They have over 35,000 virtual accounts, that means 35,000 IP addresses. Yet I believe at last count they had less than 50 servers running. Setting this up is a two part process. The first is getting Linux setup to accept more than one IP address. The second is setting up apache to serve the virtual hosts. The first step in setting up Linux to accept multiple IP addresses is to make a new kernel. This works best with a 2.0 series kernel (or higher). You need to include IP networking and IP aliasing support. If you need help with compiling the kernel see the <url name="kernel howto" url="http://sunsite.unc.edu/LDP/HOWTO/Kernel-HOWTO.html">. Next you need to setup each interface at boot. If you are using the Red Hat Distribution then this can be done from the control panel. Start X-windows as root, you should see a control panel. Then double click on network configuration. Next goto the interfaces panel and select your network card. Then click alias at the bottom of the screen. Fill in the information and click done. This will need to be done for each virtual host/IP address. If you are using other distributions you may have to do it manually. You can just put the commands in the <tt/rc.local/ file in <tt>/etc/rc.d</tt> (really they should go in with the networking stuff). You need to have a <tt/ifconfig/ and <tt/route/ command for each device. The aliased addresses are given a sub device of the main one. For example eth0 would have aliases eth0:0, eth0:1, eth0:2, etc. Here is an example of configuring a aliased device: <tscreen><verb> ifconfig eth0:0 192.168.1.57 route add -host 192.168.1.57 dev eth0:0 </verb></tscreen> You can also add a broadcast address and a netmask to the ifconfig command. If you have alot of aliases you may want to make a for loop to make it easier. For more information see the <url name="IP alias mini howto" url="http://sunsite.unc.edu/LDP/HOWTO/mini/IP-Alias.html">. Then you need to setup your domain name server (DNS) to serve these new domains. And if you don't already own the domain names, you need to contact the <url name="Internic" url="http://www.internic.net"> to register the domain names. See the DNS-howto for information on setting up your DNS. Finally you need to setup Apache to server the virtual domain correctly. This is in the <tt/httpd.conf/ configuration file near the end. They give you an example to go by. All commands specific to that virtual host are put in between the <tt/virtualhost/ directive tags. You can put almost any command in there. Usually you set up a different document root, script directory, and log files. You can have almost unlimited number of virtual hosts by adding more <tt/virtualhost/ directive tags. In rare cases you may need to run separate servers if a directive is needed for a virtual host, but is not allowed in the virtual host tags. This is done using the bindaddress directive. Each server will have a different name and setup files. Each server only responds to one IP address, specified by the bindaddress directive. This is an incredible waste of system resources. <sect2>Shared IP virtual hosting <p> This is a new way to do virtual hosting. It uses a single IP address, thus conserving IP addresses for real machines (not virtual ones). In the same example used above those 30,000 virtual hosts would only take 50 IP addresses (one for each machine). This is done by using the new HTTP 1.1 protocol. The browser tells the server which site it wants when it sends the request. The problem is browsers that don't support HTTP 1.1 will get the servers main page, which could be setup to provide a menu of virtual hosts available. That ruins the whole illusion of virtual hosting. The illusion that you have your own server. The setup is much simpler than the IP based virtual hosting. You still need to get your domain from the Internic and setup your DNS. This time the DNS points to the same IP address as the original domain. Then Apache is setup the same as before. Since you are using the same IP address in the virtualhost tags, it knows you want Shared IP virtual hosting. There are several work arounds for older browsers. I'll explain the best one. First you need to make your main pages a virtual host (either IP based or shared IP). This frees up the main page for a link list to all your virtual hosts. Next you need to make a back door for the old browsers to get in. This is done using the <tt/ServerPath/ directive for each virtual host inside the <tt/virtualhost/ directive. For example by adding <tt>ServerPath /mysite/</tt> to www.mysite.com old browsers would be able to access the site by www.mysite.com/mysite/. Then you put the default page on the main server that politely tells them to get a new browser, and lists links to all the back doors of all the sites you host on that machine. When an old browser accesses the site they will be sent to the main page, and get a link to the correct page. New browsers will never see the main page and will go directly to the virtual hosts. You must remember to keep all of your links relative within the web sites, because the pages will be accessed from two different URL's (www.mysite.com and www.mysite.com/mysite/). I hope I didn't lose you there, but its not an easy workaround. Maybe you should consider IP based hosting after all. A very similar workaround is also explained on the apache website at <url url="http://www.apache.org/manual/host.html">. If anyone has a great resource for Shared IP hosting, I would like to know about it. It would be nice to know what percent of browsers out there support HTTP 1.1, and to have a list of which browsers and versions support HTTP 1.1. <sect1>CGI scripts <p> There are two different ways to give your users CGI script capability. The first is make everything ending in <tt/.cgi/ a CGI script. The second is to make script directories (usually named <tt/cgi-bin/). You could also use both methods. For either method to work the scripts must be world executable (<tt/chmod 711/). By giving your users script access you are creating a big security risk. Be sure to do your homework to minimize the security risk. I prefer the first method, especially for complex scripting. It allows you to put scripts in any directory. I like to put my scripts with the web pages they work with. For sites with allot of scripts it looks much better than having a directory full of scripts. This is simple to setup. First uncomment the <tt/.cgi/ handler at the end of the <tt/srm.conf/ file. Then make sure all your directories have the <tt/option ExecCGI/ or <tt/All/ in the <tt/access.conf/ file. Making script directories is considered more secure. To make a script directory you use the ScriptAlias directive in the <tt/srm.conf/ file. The first argument is the Alias the second is the actual directory. For example <tt>ScriptAlias /cgi-bin/ /usr/httpd/cgi-bin/</tt> would make <tt>/usr/httpd/cgi-bin</tt> able to execute scripts. That directory would be used whenever someone asked for the directory <tt>/cgi-bin/</tt>. For security reasons you should also change the properties of the directory to <tt>Options none, AllowOveride none</tt> in the <tt/access.conf/ (just uncomment the example that is there). Also do not make your script directories subdirectories of your web page directories. For example if you are serving pages from <tt>/home/httpd/html/</tt>, don't make the script directory <tt>/home/httpd/html/cgi-bin</tt>; Instead make it <tt>/home/httpd/cgi-bin</tt>. If you want your users to have there own script directories you can use multiple <tt/ScriptAlias/ commands. Virtual hosts should have there <tt/ScriptAlias/ command inside the <tt/virtualhost/ directive tags. Does anyone know a simple way to allow all users to have a cgi-bin directory without individual ScriptAlias commands? <sect1>Users Web Directories <p> There are two different ways to handle user web directories. The first is to have a subdirectory under the users home directory (usually <tt/public_html/). The second is to have an entirely different directory tree for web directories. With both methods make sure set the access options for these directories in the <tt/access.conf/ file. The first method is already setup in apache by default. Whenever a request for <tt>/~bob/</tt> comes in it looks for the <tt/public_html/ directory in bob's home directory. You can change the directory with the <tt/UserDir/ directive in the <tt/srm.conf/ file. This directory must be world readable and executable. This method creates a security risk because for Apache to access the directory the users home directory must be world executable. The second method is easy to setup. You just need to change the <tt>UserDir</tt> directive in the <tt>srm.conf</tt> file. It has many different formats; you may want to consult the Apache documentation for clarification. If you want each user to have their own directory under <tt>/home/httpd/</tt>, you would use <tt>UserDir /home/httpd</tt>. Then when the request <tt>/~bob/</tt> comes in it would translate to <tt>/home/httpd/bob/</tt>. Or if you want to have a subdirectory under bob's directory you would use <tt>UserDir /home/httpd/*/html</tt>. This would translate to <tt>/home/httpd/bob/html/</tt> and would allow you to have a script directory too (for example <tt>/home/httpd/bob/cgi-bin/</tt>). <sect1>Daemon mode vs. Inetd mode <p> There are two ways that apache can be run. One is as a daemon that is always running (Apache calls this standalone). The second is from the inetd super-server. Daemon mode is far superior to inetd mode. Apache is setup for daemon mode by default. The only reason to use the inetd mode is for very low use applications. Such as internal testing of scripts, small company Intranet, etc. Inetd mode will save memory because apache will be loaded as needed. Only the inetd daemon will remain in memory. If you don't use apache that often you may just want to keep it in daemon mode and just start it when you need it. Then you can kill it when you are done (be sure to kill the parent and not one of the child processes). To setup inetd mode you need to edit a few files. First in <tt>/etc/services</tt> see if http is already in there. If its not then add it: <tscreen><verb> http 80/tcp </verb></tscreen> Right after 79 (finger) would be a good place. Then you need to edit the <tt>/etc/inetd.conf</tt> file and add the line for Apache: <tscreen><verb> http stream tcp nowait root /usr/sbin/httpd httpd </verb></tscreen> Be sure to change the path if you have Apache in a different location. And the second httpd is not a typo; the inet daemon requires that. If you are not currently using the inet daemon, you may want to comment out the rest of the lines in the file so you don't activate other services as well (FTP, finger, telnet, and many other things are usually run from this daemon). If you are already running the inet deamon (<tt/inetd/), then you only need to send it the SIGHUP signal (via kill; see kill's man page for more info) or reboot the computer for changes to take effect. If you are not running <tt>inetd</tt> then you can start it manually. You should also add it to your init files so it is loaded at boot (the <tt/rc.local/ file may be a good choice). <sect1>Allowing put and delete commands <p> The newer web publishing tools support this new method of uploading web pages by http (instead of FTP). Some of these products don't even support FTP anymore! Apache does support this, but it is lacking a script to handle the requests. This script could be a big security hole, be sure you know what you are doing before attempting to write or install one. If anyone knows of a script that works let me know and I'll include the address to it here. For more information goto Apacheweek's article at <url url="http://www.apacheweek.com/features/put">. <sect1>User Authentication/Access Control <p> This is one of my favorite features. It allows you to password protect a directory or a file without using CGI scripts. It also allows you to deny or grant access based on the IP address or domain name of the client. That is a great feature for keeping jerks out of your message boards and guest books (you get the IP or domain name from the log files). To allow user authentication the directory must have <tt>AllowOverrides AuthConfig</tt> set in the <tt/access.conf/ file. To allow access control (by domain or IP address) AllowOverrides Limit must be set for that directory. Setting up the directory involves putting an <tt/.htaccess/ file in the directory. For user authentication it is usually used with an <tt/.htpasswd/ and optionally a <tt/.htgroup/ file. Those files can be shared among multiple <tt/.htaccess/ files if you wish. For security reasons I recommend that everyone use these directives in there access.conf file: <tscreen><verb> <files ~ "/\.ht"> order deny,allow deny from all </files> </verb></tscreen> If you are not the administrator of the system you can also put it in your .htaccess file if AllowOverride Limit is set for your directory. This directive will prevent people from looking into your access control files (.htaccess, .htpasswd, etc). There are many different options and file types that can be used with access control. Therefore it is beyond the scope of this document to describe the files. For information on how to setup User Authentication see the Apacheweek feature at <url url="http://www.apacheweek.com/features/userauth"> or the NCSA pages at <url url="http://hoohoo.ncsa.uiuc.edu/docs-1.5/tutorials/user.html">. <sect1>su-exec <p> The su-exec feature runs CGI scripts as the user of the owner. Normally it is run as the user of the web server (usually nobody). This allows users to access there own files in CGI scripts without making them world writable (a security hole). But if you are not careful you can create a bigger security hole by using the su-exec code. The su-exec code does security checks before executing the scripts, but if you set it up wrong you will have a security hole. The su-exec code is not for amateurs. Don't use it if you don't know what you are doing. You could end up with a gaping security hole where your users can gain root access to your system. Do not modify the code for any reason. Be sure to read all the documentation carefully. The su-exec code is hard to setup on purpose, to keep the amateurs out (everything must be done manually, no make file no install scripts). The su-exec code resides in the <tt/support/ directory of the source. First you need to edit the <tt/suexec.h/ file for your system. Then you need to compile the su-exec code with this command: <tscreen><verb> gcc suexec.c -o suexec </verb></tscreen> Then copy the suexec executable to the proper directory. The Apache default is <tt>/usr/local/etc/httpd/sbin/</tt>. This can be changed by editing <tt/httpd.h/ in the Apache source and recompiling Apache. Apache will only look in this directory, it will not search the path. Next the file needs to be changed to user root (<tt/chown root suexec/) and the suid bit needs to be set (<tt/chmod 4711 suexec/). Finally restart Apache, it should display a message on the console that su-exec is being used. CGI scripts should be set world executable like normal. They will automaticaly be run as the owner of the CGI script. If you set the SUID (set user id) bit on the CGI scripts they will not run. If the directory or file is world or group writable the script will not run. Scripts owned by system users will not be run (root, bin, etc.). For other security conditions that must be met see the su-exec documentation. If you are having problems see the su-exec log file named <tt/cgi.log/. Su-exec does not work if you are running Apache from inetd, it only works in daemon mode. It will be fixed in the next version because there will be no inetd mode. If you like playing around in source code, you can edit the http_main.c. You want to get rid of the line where Apache announces that it is using the su-exec wrapper (It wrongly prints this in front of the output of everything). Be sure and read the Apache documentation on su-exec. It is included with the source and is available on the Apache web site at <url url="http://www.apache.org/docs/suexec.html"> <sect1>Imagemaps <p> Apache has the ability to handle server side imagemaps. Imagemaps are images on webpages that take users to different locations depending on where they click. To enable imagemaps first make sure the imagemap module is installed (its one of the default modules). Next you need to uncomment the <tt/.map/ handler at the end of the <tt/srm.conf/ file. Now all files ending in <tt/.map/ will be imagemap files. Imagemap files map different areas on the image to separate links. Apache uses map files in the standard NCSA format. Here is an example of using a map file in a web page: <tscreen><verb> <a href="/map/mapfile.map"> <img src="picture.gif" ISMAP> </a> </verb></tscreen> In this example <tt/mapfile.map/ is the mapfile, and <tt/picture.gif/ is the image to click on. There are many programs that can generate NCSA compatible map files or you can create them yourself. For a more detailed discussion of imagemaps and map files see the Apacheweek feature at <url url="http://www.apacheweek.com/features/imagemaps">. <sect1>SSI/XSSI <p> Server Side Includes (SSI) adds dynamic content to otherwise static web pages. The includes are embedded in the web page as comments. The web server then parses these includes and passes the results to the web server. SSI can add headers and footers to documents, add date the document was last updated, execute a system command or a CGI script. With the new eXtended Server Side Includes (XSSI) you can do a whole lot more. XSSI adds variables and flow control statements (if, else, etc). Its almost like having an programming language to work with. Parsing all HTML files for SSI commands would waste allot of system resources. Therefore you need to distinguish normal HTML files from those that contain SSI commands. This is usually done by changing the extension of the SSI enhanced HTML files. Usually the <tt/.shtml/ extension is used. To enable SSI/XSSI first make sure that the includes module is installed. Then edit <tt/srm.conf/ and uncomment the <tt/AddType/ and <tt/AddHandler/ directives for <tt/.shtml/ files. Finally you must set <tt/Options Includes/ for all directories where you want to run SSI/XSSI files. This is done in the <tt/access.conf/ file. Now all files with the extension <tt/.shtml/ will be parsed for SSI/XSSI commands. Another way of enabling includes is to use the <tt/XBitHack/ directive. If you turn this on it looks to see if the file is executable by user. If it is and <tt/Options Includes/ is on for that directory, then it is treated as an SSI file. This only works for files with the mime type text/html (<tt/.html .htm/ files). This is not the preferred method. There is a security risk in allowing SSI to execute system commands and CGI scripts. Therefore it is possible to lock that feature out with the <tt/Option IncludesNOEXEC/ instead of Option Includes in the <tt/access.conf/ file. All the other SSI commands will still work. For more information see the Apache mod_includes documentation that comes with the source. It is also available on the website at <url url="http://www.apache.org/docs/mod/mod_include.html">. For a more detailed discussion of SSI/XSSI implementation see the Apacheweek feature at <url url="http://www.apacheweek.com/features/ssi">. For more information on SSI commands see the NCSA documentation at <url url="http://hoohoo.ncsa.uiuc.edu/docs/tutorials/includes.html">. For more information on XSSI commands goto <url url="ftp://pageplus.com/pub/hsf/xssi/xssi-1.1.html">. <sect1>Module system <p> Apache can be extended to support almost anything with modules. There are allot of modules already in existence. Only the general interest modules are included with Apache. For links to existing modules goto the Apache Module Registry at <url url="http://www.zyzzyva.com/module_registry/">. For module programming information goto <url url="http://www.zyzzyva.com/module_registry/reference/"> <!-- Web Server Add-ons ======================= --> <sect>Web Server Add-ons <p> Sorry this section has not been written yet. Coming soon: mSQL, PHP/FI, cgiwrap, Fast-cgi, MS frontpage extentions, and more. <!-- WWW = FAQ == SECTION ======================================== --> <sect>FAQ <p> There aren't any frequent asked questions - yet... <!-- For further reading SECTION ================================= --> <sect>For further reading <p> <sect1>O'Reilly & Associates Books <p> In my humble opinion O'Reilly & Associates make the best technical books on the planet. They focus mainly on Internet, Unix and programming related topics. They start off slow with plenty of examples and when you finish the book your an expert. I think you could get by if you only read half of the book. They also add some humor to otherwise boring subjects. They have great books on HTML, PERL, CGI Programming, Java, JavaScript, C/C++, Sendmail, Linux and much much more. And the fast moving topics (like HTML) are updated and revised about every 6 months or so. So visit the <url url="http://www.ora.com/" name="O'Reilly & Associates"> web site or stop by your local book store for more info. And remember if it doesn't say O'Reilly & Associates on the cover, someone else probably wrote it. <sect1>Internet Request For Comments (RFC) <p> <itemize> <item>RFC1866 written by T. Berners-Lee and D. Connolly, "Hypertext Markup Language - 2.0", 11/03/1995 <item>RFC1867 writtenm by E. Nebel and L. Masinter, "Form-based File Upload in HTML", 11/07/1995 <item>RFC1942 written by D. Raggett, "HTML Tables", 05/15/1996 <item>RFC1945 by T. Berners-Lee, R. Fielding, H. Nielsen, "Hypertext Transfer Protocol -- HTTP/1.0", 05/17/1996. <item>RFC1630 by T. Berners-Lee, "Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web", 06/09/1994 <item>RFC1959 by T. Howes, M. Smith, "An LDAP URL Format", 06/19/1996 </itemize> </article>