Backing up Hubpages hubs to your computer
75I hope you have back up copies of all your hubs. You just never know what could happen - while I'm sure HubPages backs up their servers regularly, it's ultimately your content that is at risk and therefore you need to be certain that you can resurrect your hubs if they are lost.
You might also decide at some later date that you want your hubs published somewhere other than HubPages. That's not a common decision, and not something I anticipate that I would ever do, but once again, you just never know. You've made an investment: protect it.
I'm going to present a number of ways to back up your hubs, both very simple and much more advanced. While I am dealing specifically with HubPages here, some of these methods could be useful for other sites that you may write at.
We'll start with the simplest method of all: a "Save As" from your browser.
Save As
Almost all browsers offer some choice that lets you save a local copy of any web page you are viewing. It may also offer you multiple ways to save the page; for example, here is Firefox offering four different options after I clicked on File->Save Page. These choices are explained at "Saving a web page" in the Firefox support documents.
Usually, you'd choose either the first or the second of those choices. The first stores an exact copy of the page, but will not save any referenced pictures. The second saves the pictures, but also changes the structure of the links that point at those pictures so that the local copies will be used if you view the local page (File->Open).
Your browser may only offer one or two choices - for example, Chrome, a popular alternative to Firefox, only offers the first two. Consider that any archive of your hubs is better than none at all!
Saving backups this way is handy, but it is clumsy if you wanted to regularly take fresh copies of all your hubs so that new comments and other changes you may have made are preserved locally. If you have several hundred hubs, it would also be quite time consuming.
There is another slight disadvantage in that you may be saving much more than you need. Both Firefox and Chrome store every Javascript file associated with a page in addition to the pictures. All of this is stored in a directory unique for each page, so there will be duplication of those scripts and any reused pictures. That wastes disk space, and you may not care about the Javascript files at all as they are specific to HubPages.
It would be nice to have some automated method to save backups, wouldn't it?
Which hubs?
If we're going to automate this, the first thing we need is a list of the hubs to save. Fortunately, HubPages makes that fairly easy for us. You may never have noticed this, but your HubPages Statistics has an "Export to CSV" option. The yellow arrow in the picture below shows where you would click to get that.
That will save a "Comma Separated Value" file to your computer. You could open that in any spreadsheet program (Excel, Open Office or another) and as you can see in this picture, it handily includes the actual URL's of your hubs.
Once we have a list of URL's, automating a backup of your HubPages hubs becomes much easier. For example, on Mac OS X, I could simply open a Terminal window and do
for i in `cat mylistofhubs` do curl -OL $i done
to get every URL copied locally. On Linux, I'd probably use "wget" instead (and I could install that on Mac or Windows also). Note that "mylistofhubs" is NOT the hubs.csv we downloaded - it's a separate text file, possibly created from that download or even by hand.
Windows also has "VisualWget". You can paste your list of hubs into its Multiple Downloads list - I show that with just two URL's below, but you could put your entire list in.
Perl Scripts
I have mentioned Perl in some other articles here (see "Perl Scripts for Adsense", for example). Perl is included with Linux and Mac OS X, but can easily be added to Windows.
We can use Perl to automate this whole procedure from nothing more than the saved "hubs.csv" file.
Our first script only downloads the HTML files. It's short and simple:
#!/usr/bin/perl
use LWP::Simple;
mkdir "Hubpages" unless -e "Hubpages";
open(HUBS,"/Users/apl/Downloads/hubs.csv") or die "No hub list!";
@hubs=<HUBS>;
foreach (@hubs) {
next if not /http:/;
next if not /Not Published/;
($url,$title,$junk)=split /","/;
# a little cleanup
$title=~s/"//g;
$url=~s/"//g;
$out=$url;
$out=~s?http://hubpages.com/hub/??;
# strip to basename
open(OUT,">hubpages/$out") or die "Can't create $out $!";
print " Fetching $title\n";
$content = get $url;
print OUT $content;
close OUT;
}
This creates a directory "Hubpages" and simply downloads each of your hubs there.
That doesn't get our images, though, so a slightly more complicated script is what I use. This will require downloading the WWW::Mechanize module from CPAN.
CPAN is part of Perl. Unfortunately, Windows is not a friendly environment for this. You CAN use CPAN on Windows, it's just a little more difficult. On a Mac or Linux system, this can be as simple as typing "install WWW::Mechanize" within the CPAN shell. Yet another reason to prefer Macs over Windows.
The script is admittedly a bit more advanced:
#!/usr/bin/perl
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
my $pageimage = WWW::Mechanize->new();
mkdir "Hubpages" unless -e "Hubpages";
mkdir "Hubpages/Images" unless -e "Hubpages/Images";
open(HUBS,"/Users/apl/Downloads/hubs.csv") or die "No hub list!";
@hubs=<HUBS>;
foreach (@hubs) {
next if not /http:/;
next if not /Not Published/
($url,$title,$junk)=split /","/;
$title=~s/"//g;
$url=~s/"//g;
$out=$url;
$out=~s?http://hubpages.com/hub/??;
open(OUT,">:utf8","Hubpages/$out") or die "Can't create $out $!";
print " Fetching $url\n";
$mech->get($url);
print OUT $mech->content();
close OUT;
foreach my $link ($mech->images) {
$image=$link->url;
next if $image !~ /hubimg.com/;
$imagesave=$image;
$imagesave=~s/.*\///;
$imagesave="Hubpages/Images/$imagesave";
next if -e $imagesave;
print "\tFetching $image\n";
$pageimage->get($image);
$pageimage->save_content($imagesave);
}
}
This saves images, but to avoid duplication, it stores them all to one directory (Images).
Many of the images will not be ours, of course. You can see that in this snap from my computer. It would be possible to parse the page and only download those pictures that are in fact ours, but that's a much more complicated script and would be very specific to HubPages. As it is, this script could be used for any list of pages, not just Hubs.
I hope this gives you some ideas about getting backups of your pages. There are many other things we could do that would be HubPages specific. For example, we could parse the page looking for the specific text, photo and other modules we created and only save those. Again, that's a much more complicated script, but it might be worth the trouble.
Some people have noted that they create their hubs in Word or Notepad or whatever and that is their backup. Perhaps so, but that may not include edits you made later and won't show you where and how you inserted pictures and so on. It also will not include comments! Be safe: secure your pages with a local copy.
Do you know someone who should be reading this? Click the Share button below to send it to them easily or to post it to Facebook or Twitter.
vote upvote downshareprintflag
- Useful (21)
- Funny
- Awesome (3)
- Beautiful (1)
- Interesting
CommentsLoading...
Thank You I will share this at facebook. Bookmarked it.
Great Hub! I write all of my hubs in notepad before I post them, so I have .txt files of all of my hubs. As for the pictures, I have local copies of them, as well, if they aren't stock images that I can just pull down at any point in time
Great info, PC! Thanks for posting! And thank you pdh for the link!
Wow this is awesome and I will do this. I will use the export to csv option. And try and work out this Perl code!
I'm bookmarking this. I write all my Hubs in Word documents and save those, of course. I want to try this simply for the sake of learning new ways to use the computer.
Pcunix
Will take me time to digest this not use to backing up other than simple word doc. Great information thx your a gem !
This is great advice I am going to back everything up, like everyone else I have also bookmarked the page :)
Thank you Pcunix, with your many years experience, it is sure that this advice is saving many of us tons of grief. Thank you for the explanation. Bookmarked!
Pcunix,
Great Hub!
As far as tags..."Online back solution" and "online backup" come up on the keyword tool - is this competition or appropriate? Also "archiving files".
If I do this - then can I save the file to my laptop and work offline on revising it?
That's a good idea, I was doing it manually so far after publishing a hub. Automating it is a good idea and this also includes updates made later to the hubs. I'll have to take making backups more seriously in the long run.
I think it is a very good idea to backup my Hubs and to automate the process. Just not sure that I have the requisite skills to do it! Will have to see if I can follow your advice here.
Thanks so much.
Love and peace
Tony
Great information. It will take me a few reads to figure it all out but it certainly seems to be the best way to secure or hubs. Thanks again.
thanks PC!
These are good ideas and it's a very good strategy to automate the backups.
I too only use the microsoft Word document to back up my hubs. Now that you have shown me the way to do it better in this hub I will certailnly be doing it that way.
Thanks for a very interesting and informatitve hub.
Another great idea and something else I need to tinker with. Thanks a bunch. :)
Excellent ideas.
I never thought to back up by hub details in a spreadsheet before, off to start doing it now.
I haven't realized the possibilities until I read this, you are right, we must always have a back up copy of our works.. wish I can do this soon..
I think the idea of saving a hub to some sort of back up is a wise idea. Thank you for sharing the options. Great hub.
Thank you for the advice.
I voted "Awesome"
Have a good day.
Hmmm... I usually copy/paste into Word, save in a folder "Hub Pages Articles" and in the same folder, archive the photos that go with the articles.
Sometimes, I do it the other way around, and write the copy first in Word, and copy/paste it into HP capsules. ;-)
Scripts, I'm not even going to attempt--way too advanced for my skill level.
Then about once a week, I run a backup for "new and changed files" to be archived to my USB external hard drive...which is NEVER plugged in while I'm browsing the 'net.
File / Save As is a great idea. It's a backup, after all, and if it's a little difficult to deal with, well, it's a backup, after all.
OK, suppose I've backed-up pages and then want to republish one that has been unpublished and deleted. Do any of these methods circumvent redoing all the capsule work?
If you need a scenario, I changed my initial identity. I copied 15-20 hubs with Word and then did all that work again in the present identity.
Great information here, I have bookmarked your hub for future reading and reference.
Question: If you do the first option, assume the website were to go off the web... would you still have that web page saved even though the site may not exist anymore?
You have provided a very important info for all Hubbers. Since we earn through our own hubs....any digital disaster would affect our income.
Thanks for the reminder.
I see you're using a Mac, there is a much simpler way to accomplish this:
Handy tips on backup options.




































Sophia Angelique Level 6 Commenter 16 months ago
Hot hub!!! As soon as I have time, I'm going to do this!!! Thanks.