Google Search Console, site map file and WordPress

If you want to generate site-map of your WordPress based blog it’s simple and hard at the same time.

It’s hard because it’s problematic to find good plugin that can get site-map for you.

It’s simple, because you can do it without any plugin at all.

You will need: search being enabled in your blog, some time to find the oldest search result, small helper script.

First of all, start with pressing enter inside your Search field. This will give you page with search results. Something like this.

Note the address in the address bar. It should resemble something like this: http://www.owsiak.org/?s=.

Now, go ahead and click older posts as long as there are any. It will take some time, but you can see how address bar changes it’s value. It will be something like this: http://www.owsiak.org/page/2/?s, http://www.owsiak.org/page/3/?s, etc. In my case, there are 34 pages in total. So, it means, we have to query the blog 34 times for the search results.

Now, let’s automate it

# -- 8< -- cut here -- script.sh -- cut here -- 8< --

#!/bin/bash

for i in `seq 1  34` ; do
  curl http://www.owsiak.org/page/${i}/?s -o page_${i}
done

# -- 8< -- cut here -- cut here -- cut here -- 8< --

and you can run it. Of course, make sure to adapt the script for your needs.

> chmod +x script.sh
> ./script.sh

You will end up with lots of partial results inside files page_1, …, page_34. These are your search results for your blog.

Now, let’s extract the content. Please note that this part may depend on your blog template. So, make sure to properly extract info from the files.

> cat page_* | grep "<li>" | cut -f2 -d"=" | cut -f2 -d'"' > site-map.txt
> tail site-map.txt | sed 's/^/- /'
- http://www.owsiak.org/redirecting-http-traffic/
- http://www.owsiak.org/mr-robot-some-thoughts-about-bugs/
- http://www.owsiak.org/coding-apprentice-todos/
- http://www.owsiak.org/if-you-have-makefile-that-has-weird-name-but-you-still-want-syntax-highlighting/
- http://www.owsiak.org/makefile-all-you-wanted-to-know-about-variable-but-were-afraid-to-ask/
- http://www.owsiak.org/makefile-know-your-location/
- http://www.owsiak.org/scratch-programming-playground-by-al-sweigart/
- http://www.owsiak.org/two-line-prompt-with-colors-and-gadgets/
- http://www.owsiak.org/mwm-confing-for-people-who-like-minimalism-during-remote-work/
- http://www.owsiak.org/makefile-that-calls-itself-without-hardcoding-file-name/

Once you have your site-map.txt you can upload it to your blog page and then, add it inside Google Search Console.

That’s it.