Dec
3

Search Engine Safe URL's (often abbreviated SES) are attractive to Web developers and business owners because they "pretty up" the browser address bar, help search engines crawl through site content, and generally make it easier to share URL's to content deep within a Web site. The advantages to using SES URL's are covered exhaustively throughout the Web if you want more reasons why you should use them. For information on how to set up and configure Apache's mod_rewrite including the creation of SES URL's for a Mach-ii site, read on...

All major Web servers support configuring SES URL's. Microsoft's Internet Information Server (IIS) accomplishes this through ISAPI rewrite DLL's. Apache employs a module-based configuration with a module called mod_rewrite accomplishing the task. To get started with Apache you'll need to ensure your Apache installation can make use of the mod_rewrite module. Depending on what operating system Apache is installed and how it was installed mod_rewrite may or may not be available. To check, you'll want to look for the mod_rewrite.so file in Apache's modules directory.

Typical mod_rewrite.so locations by operating system:

Windows:
C:\Program Files\Apache Group\Apache2\modules\

OS X:
If you compiled and installed Apache2 on your own, you're probably already bored with this article. Nevertheless, the modules directory is typically located here: /usr/local/apache2/modules/

If you're using Leopard with the version of Apache2 that comes pre-installed, Apache2 will be located here:
/private/etc/apache2/

There won't be an Apache2 "modules" directory, but all modules will be located here:
/usr/libexec/apache2/

If you don't find mod_rewrite.so in your modules location, you'll likely need to re-install Apache2. Operating system specific Apache2 installation instructions are located here: Apache 2.0.x or Apache 2.2.x. After verifying the existence of mod_rewrite.so you'll need to ensure the module is loaded into Apache2 when Apache starts up. This is done with a simple LoadModule command in httpd.conf, Apache's configuration file. Open the configuration file and find the existing LoadModule section. Verify there's a line that looks something like the following, what you see will depend on where your Apache2 modules directory is located on your system:

LoadModule rewrite_module modules/mod_rewrite.so

In order to begin using mod_rewrite in Apache you'll need to turn the RewriteEngine on. This is typically done inside a VirtualHost block if you're serving up multiple Web sites in Apache. With the RewriteEngine turned on, you'll need to tell Apache exactly how you want it to rewrite URL's by issuing one or more RewriteRule instructions. These are regular expression patterns that match a URL someone types into their browser with a resource on the Web server. Let's look at a really common example.

Example One: Forcing the www Prefix:
Suppose you've defined a VirtualHost for www.yourdomain.com which serves content for your Web site. Currently, visitors who omit the "www" prefix are not finding your Web site but you'd like your site to work whether users remember to type the prefix or not. You can accomplish this by adding a new VirtualHost block for "yourdomain.com" that uses Apache's rewrite engine.

# This virtual host block will force a redirect of www.yourdomain.com to www.yourdomain.com.
<VirtualHost 127.0.0.1:80>
ServerAdmin you@yourdomain.com
DocumentRoot /path/to/yourdomain.com
ServerName yourdomain.com

DirectoryIndex index.cfm index.htm

RewriteEngine on
RewriteOptions MaxRedirects=10
RewriteLog logs/yourdomain.com-rewrite_log
RewriteCond %{HTTP_HOST} !^www\.yourdomain\.com [NC]
RewriteCond %{HTTP_HOST} !^$
RewriteRule ^/(.*) http://www.yourdomain.com/$1 [L,R]
</VirtualHost>

The first four lines in this httpd.conf excerpt are common VirtualHost settings. The next three lines are responsible for turning the rewrite engine on, setting the maximum number of mod_rewrite redirects (to ensure infinite redirects do not occur), and specifying a log file to record all redirect actions Apache takes. Having a separate redirect log is handy if you want to know how many people are coming to your site by omitting the www prefix in your URL. The last three lines configure rewrite conditions and rules and accomplish our task. The RewriteCond directive defines a rule condition in the form RewriteCond TestString ConditionPattern. You can include one or more RewriteCond directives before a RewriteRule directive. Apache will execute the RewriteRule directive (causing the browser to redirect to a new URL) if and only if the URL matches the pattern and the defined conditions are met. The two rewrite condition directives say: does the browser address NOT match www.yourdomain.com. The [NC] portion indicates a case-insensitive pattern test. If these two conditions are met, the RewriteRule will be executed which will cause the browser to redirect from the non-www address to http://www.yourdomain.com. What's nice about the RewriteRule directive here is it will tack on query string values from the original non-www URL to the www URL when the redirect occurs. This means any URL parameters or directories like yourdomain.com/go/tutorials/ will be passed on to www.yourdomain.com. This is accomplished with the $1 variable which says: take the first parenthetical expression (the (.*) part of the RewriteRule directive - which are any query string parameters following the domain name) and tack it on to http://www.yourdomain.com/. Finally, the [L,R] flags instruct Apache to stop the rewriting process (the L flag), and force an external redirect (the R flag) and replace the browsers address with the new URL specified in the rewrite rule. Essentially, this is performing a redirect with the HTTP response code of 302 MOVED TEMPORARILY. If you want to redirect with a different HTTP response code (in the range 300-400) you can provide the appropriate number in the form R [=code]. There may be advantages or disadvantages to search engine optimization and crawlers depending on the code you provide. So be careful.

Example Two: Using SES URL's with Mach-ii:
Really common in today's Web world are "go" URL's (a specific form of SES URL) in the form: yourdomain.com/go/something. They're especially prevalent on large, enterprise sites like adobe.com. Adobe uses these URL's to provide short, product specific links that are easy to remember and pass around. Event-based, front-controller frameworks such as Fusebox and Mach-ii filter all requests through a single server-side resource (typically index.cfm). Adding a couple of lines to your Apache configuration file will allow you to integrate these types of URL's with a ColdFusion framework like Mach-ii. Consider the following code:

# This is the main virtual host definition for www.yourdomain.com.
<VirtualHost 127.0.0.1:80>
ServerAdmin you@yourdomain.com
DocumentRoot /path/to/yourdomain.com
ServerName yourdomain.com

DirectoryIndex index.cfm index.htm

RewriteEngine on
RewriteLog logs/yourdomain.com-rewrite_log
RewriteRule ^/go/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/([A-Za-z0-9-]+) /index.cfm?go=$1:$2.$3 [PT,L]
RewriteRule ^/go/([A-Za-z0-9-]+)/([A-Za-z0-9-]+) /index.cfm?go=$1:$2 [PT,L]
RewriteRule ^/go/([A-Za-z0-9-]+)/?$ /index.cfm?go=$1 [PT,L]
</VirtualHost>

There are three rewrite rules above listed from most specific to least specific. The order was created intentionally to ensure the most specific rules are evaluated and applied first before more general rules. As in the previous example, there are three parts to the RewriteRule directive. The directive name itself, followed by the regular expression pattern to match, followed by the replacement expression. Let's begin by explaining the last rule since it is the most simple.

Regular expression pattern: ^/go/([A-Za-z0-9-]+)/?$
Replacement expression: /index.cfm?go=$1 [PT,L]

This regular expression pattern says: match any URL that begins yourdomain.com/go/, followed by one or more alphabetic or numeric characters, and optionally ending in a trailing forward slash. If the pattern is matched, Apache will internally direct the client's browser to index.cfm?go=something. The "something" is the content in the regular expression pattern following /go/. For example, if a visitor typed yourdomain.com/go/tutorials or yourdomain.com/go/tutorials/ (notice the trailing slash), they would be redirected to yourdomain.com/index.cfm?go=tutorials. The $1 in the replacement expression is Apache's variable name for the content following /go/ in the regular expression pattern. The [PT,L] portion are pass through, and last rule directives to the rewrite engine. These flags instruct the rewrite engine to go ahead and pass through the rewriting of the URL to mod_rewrite and issue a "soft" redirect immediately. Remember the [L,R] in the first example? The R portion instructed mod_rewrite to force URL redirection and change the address displayed in the browser. A "soft" redirect instructs mod_rewrite to redirect internally and NOT change the browsers address. In other words, even though the URL gets rewritten to yourdomain.com/index.cfm?go=tutorials, the address bar will continue to show yourdomain.com/go/tutorials.

Mach-ii sites with little or "shallow" content may need only this last rewrite rule to translate SES URL's to events defined in mach-ii.xml. However, Mach-ii version 1.5 introduced a really useful feature called modules. Modules allow for the creation or consumption of sub-applications. Continuing my illustration of tutorial URL's, assume you have a site with several tutorials. Each tutorial might have several pages of content. If you wanted to allow people to deep-link any page of any tutorial, with an SES URL, you could utilize Mach-ii's modules feature to accomplish this. Vanilla Mach-ii module URL's for this type of setup might look like this:

URL to the main tutorials page:
http://www.yourdomain.com/index.cfm?go=tutorials

URL to the main SES tutorial page:
http://www.yourdomain.com/index.cfm?go=tutorials:ses_urls

URL to page 1 of the SES tutorial:
http://www.yourdomain.com/index.cfm?go=tutorials:ses_urls.page1

URL to page 2 of the SES tutorial:
http://www.yourdomain.com/index.cfm?go=tutorials:ses_urls.page2

Translating the above URL's into SES URL's you might engineer something like this:
http://www.yourdomain.com/go/tutorials
http://www.yourdomain.com/go/tutorials/ses_urls
http://www.yourdomain.com/go/tutorials/ses_urls/page1
http://www.yourdomain.com/go/tutorials/ses_urls/page2

Looking back to example two above, the first and second RewriteRule directives accomplish translating the Mach-ii module URL's into SES URL's. The second rewrite rule, being slightly more specific than the third, translates URL's in the form /go/tutorials/ses_urls into /index.cfm?go=tutorials:ses_urls. Each parenthetical section of the regular expression pattern gets saved into $1 and $2 variables respectively, allowing the last portion of the directive to translate the URL into the expected Mach-ii form. The first RewriteRule directive accomplishes the most specific type of URL where a user is requesting a specific page of a tutorial. This rule has three parenthetical expressions to match and translate into a new SES URL. Remember how I mentioned each RewriteRule was listed from most specific to least specific? If you haven't figured it out yet, this order is important given each rule has the [PT,L] flags. The first rule will be tested first and if the pattern matches, mod_rewrite will stop it's rule evaluation and issue the redirect immediately. Ordering from most specific to least specific allows you to have SES URL's for even the most embedded site content while not trampling on SES URL's to top-level content.

A Final Consideration
Personally, I prefer configuring SES URL's at the Web server level due to the sheer power and flexibility provided. However, with this flexibility comes a bit of complexity. If you don't want to take on the Web server or you're on shared hosting where you don't even have access to the Web server, you have alternatives. Mach-ii 1.5 in particular is built with support for search engine friendly URL's through the addition of the BuildUrl() and BuildUrlToModule() methods. The beauty of this feature is it travels with your Web site should you move your files from one server to another. You don't have to worry about pre-configuring your Web server with mod_rewrite directives before going live with a site move. Your SES configuration is built right into your ColdFusion files. For more on Mach-ii 1.5's URL features click here. For additional examples of using mod_rewrite to manipulate URL's in Apache, click here. I hope this article has helped explain what mod_rewrite is and how you can set it up to make the most of your Apache Web server. If something still isn't clear or you have questions, feel free to e-mail me.

Aaron West's Gravatar
About this post:

This entry was posted by Aaron West on December 3, 2007 at 8:00 AM. It was filed in the following categories: ColdFusion, Apache, Mach-ii. It has been viewed 132366 times and has 17 comments.

17 Responses to Using Apache's mod_rewrite: SES URL's and More

  1. Thanks for the post.. one thing, for the domain prefix, couldn't you just add
    ServerAlias domain.com

    to the conf?

  2. Great entry Aaron. Somehow I missed this when you first posted it.

  3. Thanks Dave. The information contained in this entry (with fewer examples) will also be available in an upcoming article in the Fusion Authority Quarterly Update Volume 2 Issue 4 (due out in February).

  4. Jason

    Just wondering what you thought about pointing your web server to a .cfm page for your site's 404 handler, and letting ColdFusion parse out the requested template? Has always worked very nicely for me and saves me having to learn url rewriting. But I don't think most folks do it this way. Why wouldn't you do it this way? Great post btw.

  5. @Jason, the two things you mention are actually quite separate. You can utilize URL rewriting on a Web server and still point 404's to a ColdFusion template. The first step is to define a "Missing Template Handler" in your ColdFusion Administrator. Any .cfm file requested - that does not exist - will cause the template you nominate in the Missing Template Handler setting to execute. Of course, missing resources like HTML files, images, and Flash movies will not be handled by this setting. Pointing your Web server to a .cfm page - as you mentioned - is also an option.

    Regardless of the mechanism you use, it is important to have something in place to handle 404's.

  6. Jason

    Hey Aaron - no I'm sorry: I don't mean the missing template handler. I mean actually setting IIS or Apache to serve a dynamic cfm template for any 404 request. That template can then parse out the web server's cgi variables (and they vary so you have to test - but everything I need I've seen on apache, IIS and sun one). A good example is tinyurl.com: instead of 404'ing every redirect, their 404 handler looks up (and then redirects) to what was requested. I've been taking this further with framework url parameters, etc. But I dont see a lot of people doing it, preferring instead to wade into url rewriting. So I figured I must be missing something - like perhaps that a dynamic 404 is bad practice or slow or something... Thanks - appreciate the exchange -J

  7. Iam a model glue develper and the urls on this framework is slightly different from Machii.
    Can you please help me by providing Apache Mode rewrite rule as well as IIRF rewriter rule for doing the below conversion.

    I need to covert http://domain.com/index.cfm?event=register&foo...
    to http://domain.com/register/foo/bar/foo2/bar2/foo3/...

  8. @Shimju David

    I'll leave you to come up with the proper RegEx to handle Model-Glue URL's. If you follow the examples I've outlined this should be relatively easy to do.

  9. George

    Can you suggest a mod rewrite rule which could handle this.
    http://myhost.com/K1-2/K2-37/K3-45

  10. @George

    To support the kind of URLs you are asking about you'd want to create a regular expression that matched alphanumeric characters followed by a literal dash, followed by another set of alphanumeric characters.

    I'll admit up front, that I've never written a RegEx that did exactly what I wanted on first try. But, the below should get you on your way. There are two parenthetical expressions below which account for the characters before and after a dash in just the first part of your URL: myhost.com/K1-2/

    In between the two parenthetical expressions is (what I hope) a literal dash character. If you wanted to support URLs that repeated this pattern (your example had three) you'd just copy and paste everything between the last forward slash and second to last forward slash.

    RewriteRule ^/([A-Za-z0-9-]+)-([A-Za-z0-9-]+)/

    Of course, the above RewriteRule isn't complete as I didn't provide the string I want to use as the replacement condition. If I added that such as:

    RewriteRule ^/([A-Za-z0-9-]+)-([A-Za-z0-9-]+)/ /index.cfm?go=$1:$2 [PT,L]

    you'd be rewriting URLs from myhost.com/K1-2 to myhost.com/index.cfm?go=K1:2

    Since you only provided the source URL and not what you wanted it rewritten to I just made up the above example. Hope this helps.

  11. George

    @Aaron, thanks for the feedback. Sorry for not being more clear. Without any URL rewrite my URL would look like this. index.cfm?L1=25&L2=1&L3=4
    I want it to look like this index.cfm/L1-25/L2-1/L3-4 I am not using Fusebox or MachII. I am using a code based component that is accessed as a singleton
    object onRequestStart to rewrite the URL string. I figured this might be a better way to do it. I did notice that in my error log for Apache I am getting a page not found error.
    Also, will the mod_rewrite work for staright calls without any parameters. How would I add this as well? article.cfm/L1-25/L2-1/L3-4

  12. George

    @Aaron, this is what my virtual host block looks like but I am getting an error restarting this server with this. Any thoughts on what might be wrong.
    #<VirtualHost *:80>
    # ServerAdmin email@myhost.com
    # DocumentRoot "C:/projects/gsigminidvworld"
    # DirectoryIndex index.cfm
    # ServerName lc.gsig.com
    # ServerAlias lc.gsig.com *.lc.gsig.com
    # RewriteEngine on
    # RewriteLog C:/Program Files/Apache Software Foundation/Apache2.2/logs/lc.gsig.com-rewrite_log
    # RewriteRule ^/([A-Za-z0-9-]+)-([A-Za-z0-9-]+) /index.cfm?L1=$1:$2 [PT,L]
    # ErrorLog "C:/Program Files/Apache Software Foundation/Apache2.2/logs/lc.gsig.com-error.log"
    # CustomLog "C:/Program Files/Apache Software Foundation/Apache2.2/logs/lc.gsig.com-access.log" common
    #</VirtualHost>

  13. @George - You're getting an error on Apache restart even with your entire VirtualHost block commented out? I'm going to have to do a lot of guessing here, but on Apache 2.2.x you have to explicitly enable virtual hosts. Doing so causes the httpd-vhosts configuration file to be included at startup. If this config file is empty, you might get errors. In the least, I'd leave the examples that come with the file uncommented and see if that gets Apache started.

    You might also try Apache's configtest which is part of the apachectl. On Mac OS X I simply navigate to the directory with apachectl and type: "apachectl configtest" If everything in my configuration files (all of them) check out, Apache will show the text "Syntax Ok." If something is wrong, it will give an error message to help you out.

  14. Hello there! Would you mind if I share your blog with my
    zynga group? There's a lot of people that I think would really enjoy your content.
    Please let me know. Thanks

  15. Yes! Finally something about drive data recovery.

  16. Unquestionably believe that which you said. Your favorite justification appeared to be on the
    net the simplest thing to be aware of. I say too you, I
    certainly get irked while people conside worries that
    they just doo nnot know about. You managed to hit the nail upon the top and defined out
    thee whol thing without having side-effects ,
    people can take a signal. Will probably be back to get more.
    Thanks

    Look att my web blog :: <a href="http://devote.dpi.me">marketing</a>;

  17. Hi there to every single one, it's genuinely a fastidious for
    me to visit this web page, it includes helpful Information.