Back up and restore a cluster

BigAnimal automatically and continuously saves backups of your clusters. You can restore your cluster to any point in the past after the initial backup on cluster creation. That functionality is included in the free trial as well. While that means you don't have to do anything to generate backups, it's always sensible to test your backups.

If you haven't already, create a cluster on BigAnimal.

We're going to add a database called "baseball," which we'll populate with some Major League Baseball statistics.

create database baseball;

Now you can switch to your new (and empty) baseball database. You're prompted for the edb_admin password you provided when connecting to the cluster.

\c baseball

Normally you use the log files to determine the moment in time that you want to restore. For the purposes of this demonstration, let's change the prompt to include a timestamp to have an accurate timing of our activities. For convenience, we'll use the (almost) ISO-8601 format that the restore command accepts.

\set PROMPT1 '%`date +"%Y-%m-%dT%H:%M:%S%z"` %/%R%# '

For this demonstration, we're just going to import batter data from the Baseball Databank, which is in CSV form. While it's easy to import the data using PostgreSQL's COPY command, we'll need to first define a table to put that data into. Most of the columns are integers, but there are a few strings to consider as well. You can copy and paste this command into your terminal.

CREATE TABLE batters (
                      id SERIAL,
                      playerid VARCHAR(9),
                      yearid INTEGER,
                      stint INTEGER,
                      teamid VARCHAR(3),
                      lgid VARCHAR(2),
                      g INTEGER,
                      ab INTEGER,
                      r INTEGER,
                      h INTEGER,
                      "2b" INTEGER,
                      "3b" INTEGER,
                      hr INTEGER,
                      rbi INTEGER,
                      sb INTEGER,
                      cs INTEGER,
                      bb INTEGER,
                      so INTEGER,
                      ibb INTEGER,
                      hbp INTEGER,
                      sh INTEGER,
                      sf INTEGER,
                      gidp INTEGER,
                      PRIMARY KEY (id)
);

Now we can populate the table from the internet using the most recent data.

\COPY batters(playerid,yearid,stint,teamid,lgid,g,ab,r,h,"2b","3b",hr,rbi,sb,cs,bb,so,ibb,hbp,sh,sf,gidp) FROM PROGRAM 'curl "https://raw.githubusercontent.com/chadwickbureau/baseballdatabank/master/core/Batting.csv"' DELIMITER ',' CSV HEADER

Just to prove there's data loaded, let's look at the home run leaders for the 1998 season.

SELECT playerid, yearid, teamid,
       rank() OVER (PARTITION BY yearid ORDER BY hr desc) hr_rank,
       hr
FROM batters
WHERE yearid = 1998
ORDER BY hr_rank LIMIT 5;

Suppose someone wanted to revise history a bit.

UPDATE batters 
SET hr=0 
where playerid = 'mcgwima01' AND yearid = 1998;

Note the time so we can restore that data later. Verify the data has been changed by rerunning the 1998 home run leader query. Now go ahead and drop the whole table.

DROP TABLE batters;

You can verify the table is gone by looking at the list of tables.

\dt

Restore the cluster using the timestamp before the time you dropped the table.

biganimal cluster restore --id p-xxxxxxxxxxx --from-deleted --restore-point 2022-03-31T02:39:40+0000
Output
✔ New Name for Restored Cluster, Leave empty to continue with source cluster data: █
Restored Cluster Password: *************
Region: US East 1
Instance Type: r5.large(2vCPU, 16GB RAM)
Volume Type: General Purpose SSD (GP3)
Volume Properties: gp3 (1 Gi - 16384 Gi, 3000 IOPS, 125 MiB/s tput)
✔ Specify database configuration, for example "application_name=sample_app&array_nulls=true". Leave empty to continue with source cluster data: █
High Availability: Keep Existing
Networking: Keep Existing
Add IP Range to allow network traffic to your cluster from the public Internet: Keep Existing
Are you sure you want to Restore Cluster ? [y|n]: y
Restore Cluster operation is started
Cluster ID is "p-yyyyyyyyyy"
To check current state, run: biganimal show-clusters --id p-yyyyyyyyyy
Optional arguments for the CLI

We're only providing the information we have handy to the restore command: the ID of the cluster, the fact that the cluster is deleted, and the timestamp for the restore point. The CLI will prompt for additional information, such as the password or the region to restore to, and allow you to enter or select from a list as appropriate.

If you know ahead of time the options you'll need, you can provide all of them using parameters, and the command will run non-interactively. The values for the other parameters you don't specify will be inherited from the source cluster.

Check the status of the new cluster from the command line.

biganimal cluster show

Once provisioning is complete, you'll want to grab the new connection information. The hostname changed!

biganimal cluster show-connection --id  p-yyyyyyyyyy

Log in to the new cluster. Be sure to use the new hostname, and use the edb_admin password you provided when restoring the cluster.

Verify that the batters table was restored:

select playerid, yearid, teamid,
       rank() OVER (PARTITION BY yearid ORDER BY hr desc) hr_rank,
       hr
from batters
where yearid = 1998
order by hr_rank limit 5;

Further reading

BigAnimal CLI reference and Backing up and restoring in the full version documentation.