adding a sitemap to a rails app with heroku

No site is complete without one of those good ole sitemaps. I build my site, then I make sure I’m taking care of the blocking and tackling from a search engine perspective. That means things like creating a sitemap! There are really two options when it comes to site maps on heroku. The easy implementation is more manual, and the other option is completely automated, but much harder from a coding perspective.

Option1: Simple Heroku Sitemap Implementation

The easiest way by far is to deploy your application to Heroku. Then go to a free sitemap making service to create your sitemap, and upload it to your app’s public folder. The free sitemaper will visit your website and follow all the links they can find. Then they’ll put it into a sitemap you can download. They make it pretty easy.

The downloaded file will be sitemap.xml. Simply move this file into your rail’s public folder.

$ git add .

$ git commit -m 'added site map'

$ git push heroku master

And once it is done deploying you can visit http://www.your-heroku-app.com/sitemap.xml and view your brand new sitemap.

Bam, you’re done in under 15 minutes. But each time you want to update your sitemap, you’ll have to go to the free sitemap generator, download their sitemap, put it into your public folder, and deploy to heroku. The next approach is more automated, but much much more painful.

Option2: AWS + sitemap_generator + Heroku Scheduler

Difficulty: Stab Your Eyes Out Painful

For this tutorial you’ll need to have an Amazon Web Services Account. You’ll need your Secret Key, ID, and a bucket (this can be the hardest part). Here is a key generating tutorial from Amazon. I personally think their documentation is shitty and only serious pro’s can understand it. So I’m sorry if you can’t figure out their IAM crap. The unfortunate thing with AWS is they probably hold an incredible amount of data and they’ve done a really good job keeping it in the hands of the right people. It just makes it harder for us nooblies.

But the bottom line is you’ll need all that key information from Amazon.

Gemfile

gem 'sitemap_generator'
gem 'fog-aws'

$ bundle

Yay, we’ve got all the gems installed!

$ rake sitemap:install

This creates a file, config/sitemap.rb which you’ll now need to open and configure.

config/sitemap.rb

Set the host name for URL creation

SitemapGenerator::Sitemap.default_host = "http://www.clashprogress.com"

SitemapGenerator::Sitemap.adapter = SitemapGenerator::S3Adapter.new( fogprovider: 'AWS', awsaccesskeyid: ENV['AWSACCESSKEYID'], awssecretaccesskey: ENV['AWSSECRETACCESSKEY'], fogdirectory: ENV['S3BUCKET'], fogregion: ENV['AWS_REGION'])

SitemapGenerator::Sitemap.sitemapshost = "https://s3-#{ENV['AWSREGION']}.amazonaws.com/#{ENV['S3_BUCKET']}/"

SitemapGenerator::Sitemap.sitemapspath = 'sitemaps/' SitemapGenerator::Sitemap.publicpath = 'tmp/'

SitemapGenerator::Sitemap.create do # here is where you add all the pages you'd like to sitemap add postspath, priority: 1, changefreq:'always' Post.findeach do |post| add postpath(post.slug), :lastmod => post.updatedat end end

sitemap_generator offers good documentation on adding the correct pages to the sitemap.rb. You’ll mostly be using rails routes and basic rails c commands.

Setting Config Variables on Heroku

$ heroku config:set AWS_ACCESS_KEY_ID='my_access_key_from_aws' AWS_SECRET_ACCESS_KEY='super_secret_access_key' S3_BUCKET='bucket_name' AWS_REGION='aws_region'

So obviously, you’ll have to use your AWS Account information for all those field. So for me I created a bucket named clashprgress. So I’d run the command

$ heroku config:set S3_BUCKET='clashprogress'

So now we’ve got sitemap_generator configured and the heroku config variables set. So ideally, our application can talk to AWS through fog. And now we’ve only got to configure Heroku to run the correct rake commands. So let’s commit our shit to heroku.

$ git add .

$ git commit -m 'added sitemap config'

$ git push heroku master

Add Heroku Scheduler To your App, Open It, Add Rake Task

At this point try and run the site map generator on heroku. $ heroku run rake sitemap:refresh. This command should create your sitemap, upload them to AWS and ping google and bing to say hey, here is my new sitemap! If this works, you can then navigate in your AWS account to see your new sitemap. Say a little prayer prior to running that command up there. I’m not religious, but 95% of the time it doesn’t help. But it’s worth it for that 5%. If it works, I’d cheer loudly.

Okay, but really we want to automate this. Who wants to run rake commands every time you need to update your sitemap. No one remembers to update the sitemap. Plus if you accidentally run the command without heroku run, you’ll create a sitemap for your local development, and that’s just embarrassing. I only left a local development sitemap out there for my site for a few days. Oops. I guess that’s why they created heroku scheduler. It’s a free add-on that will run rake commands for you at various times. It never messes up!

Yeah, so make this addon by running these commands! $ heroku addons:create scheduler:standard

That one just created your heroku addon, Merry Christmas.

$ heroku addons:open scheduler

This one is going to open your browser. When that happens, you’ll see a nice little button, Add New Job. Yes, click that button and input rake sitemap:refresh for into their box and set the timer to Daily.

Bam, now our sitemap is being updated every day without us having to do anything!

Tell those robots where our sitemap is

Robots are stupid. That’s why they expect our sitemap to be at www.ourwebsite.com/sitemap.xml. But we decided to throw a curveball at these robots, and they didn’t expect it. So we gotta tell these dumbass bots where to look. So login to your AWS account and navigate to your now uploaded site map. Right click on it and click properties. In this menu you’ll find a link to your sitemap. Copy this link.

Then in your rails app, open public/robots.txt and update the following at the very bottom of your robots.txt

SITEMAP: https://s3-us-west-2.amazonaws.com/clashprogress/sitemaps/sitemap.xml.gz

Of course, you’ll want to update your sitemap to reflect your own url.

Submitting Sitemap to Webmaster Tools

After a couple days of having my sitemap on Amazon, google still hadn’t discovered and index anything. Google is a punk sometimes. I also could not submit the sitemap via webmaster tools because google forces you to use the verified domain, in my case, http://www.clashprogress.com. So what is one to do? 301 Redirect!!!

301 Redirect to Sitemap

At first I thought I’d go through my DNS provider, dnsimple. But they didn’t offer anything like this. Basically, I wanted anyone who visited http://www.clashprogress.com/sitemap.xml to be redirected to https://s3-us-west-2.amazonaws.com/clashprogress/sitemaps/sitemap.xml.gz.

To do this I added one line to my routes.

get '/sitemap.xml', to: redirect("https://s3-us-west-2.amazonaws.com/clashprogress/sitemaps/sitemap.xml.gz", status: 301)

That part wasn’t so bad, but the rest ain’t so easy. Good luck, you’ll probably need it.

Post Content