File transfer using Globus on Your Home Computer, Clusters, and Data Archives
File transfer to a Jetstream instance can be done through several ways. For small files, such as a simple two-line text file, copy pasting the content is probably just as quick as transferring the file. When you get to larger or multiple files, such as a directory (folder) or bigger file, you can transfer it using:
- scp, rsync (if you are comfortable with command line),
- WinSCP , FileZilla (windows users) or
- CyberDuck (windows and OS).
Using these commands to transferring files and managing data is also covered here.
However, when you have really large files, such as raw sequence data, it can be a bit tedious to transfer files via command line, not to mention problematic if your internet connection drops out. In these cases, Globus is a favorable alternative. Globus allows you to transfer gigabytes or terabytes of data securely and quickly.
Before you begin
To transfer files between two points using Globus, make sure the two points (example laptop to clusters) have active Globus endpoints.
Endpoints are the two points you want to transfer data to and from. Both must have Globus installed and setup according to the endpoint type. Setting Globus up on your computer is free (see below), but clusters, academic, or commercial endpoints are required to pay for a subscription. For more information about subscription type and their features, see the subscription information here.
- If you are using IU HPC or XSEDE clusters – don’t look further, all these systems have an active Globus endpoint already. In Jetstream, globus subscription is available but not setup/installed on VM’s, to do that first follow this tutorial then goto here.
- If you are using other clusters, contact your HPC team to know if you have Globus endpoint setup.
Setting up Globus endpoint on your personal computer
It is likely you will need to transfer files between your preferred cluster and your home computer at some point. To set up Globus on your home computer first, register for an account with Globus connect by going here and click sign in. You can set up an account using Google using the following steps here if your university or company does not have a subscription.
You have access to Globus plus features with an XSEDE account (required for Jetstream – if you don’t have one, go here and fill out the form. It does not take long. See here for more detail on getting started on XSEDE and Jetstream). Globus plus allows you to transfer files between two personal computers or between Jetstream image to a personal computer. Login into your Globus account, and select on “Accounts”.
Go to subsection “Globus Plus”.
Select the option XSEDE Plus Sponsor
Once you click XSEDE Global Plus Users. Select “Submit Application”
Once your request to submit the application, you will receive an email to your primary identity (usually takes ~hour to your primary email account) listed under Accounts. As soon as you are accepted, you will be able to transfer between two Globus personal endpoints (i.e. your computer and your Jetstream VM)!
More information and tutorials on Globus set up are available here.
Globus endpoints on HPC clusters
If your company/institution has a Globus endpoint installed (this is a paid subscription), you should be able to search for it or consult your local documentation. For example, if you have access to IU clusters, you will have automatic access to the /N/dc2/scratch/username directory, and if you enter the path, you can also access your /N/dc2/project directories. Files here can be transferred to other global endpoints or your personal computer.
Look for the endpoint on globus transfer as iu#dc2. Follow these steps:
1. Log in at https://www.globus.org/
You should see this page after logging in, if not click on “Transfer”.
2. Click on endpoint and you should see something similar to this. In this example, there are three endpoints setup; IU’s file system (iu#dc2), an endpoint on my bcbio instance using the endpoint set up in the first section (bcbio_js), and my laptop (my laptop). Now I can transfer files between any of these three endpoints. If you don’t see an endpoint you just setup, search for it in the tab highlighted below.
Globus endpoint on Data Archives
Data archiving is a very important step in data management, IU users have access to Scholarly Data Archive (SDA) and XSEDE users have access to Wrangler (https://portal.xsede.org/tacc-wrangler). Both these data archives also have globus endpoint setup, so transferring your data is so much easier with Globus/
To transfer data from HPC, for example Carbonate (Globus endpoint – iu#dc2), or your laptop to SDA (Globus endpoint – IU – Scholarly Data Archive). Wrangler endpoint is “XSEDE TACC Wrangler” or “XSEDE TACC Wrangler”.
Login to Globus and look for these endpoints and start transferring files.
DONE! You are all set and transfer away
- I have logged in to globus but can’t find IU#dc2 endpoint to transfer files to IU computers? Here is the trick you need to login to globus using your google account, which means,
- you need to have your google account (firstname.lastname@example.org), xsede account (email@example.com), IU ID (firstname.lastname@example.org) or other accounts all listed under accounts.
- Now logout and log back in using your Google credentials
- Search for the endpoint IU#dc2 and you should be able to connect to this endpoint.
- I still cannot get access to IU/XSEDE/Whatever computer…?One of our staff ran into this when adding a new account from a different machine. According to Globus (we had to submit a help ticket! We’re on that end of things at times too ^_^), a common issue is that different computing systems use different authentication software and it can confuse the system if you add new ones. There is a very easy fix for this –
- Go to https://www.globus.org/app/endpoints?scope=all
- Click your endpoint or if it is not showing up, click “search all” in the grey bar above the list and then type in the name in the search box, then click on the name.
- You should see a bunch of information on the machine, and some tabs that say “Overview”, “Server”, etc. Click on “Extend Activation”.
- Click on “Extend Activation”, then continue, and it will send you to a page that asks you for your identity provider. Select the organization/institution that owns the machine you are trying to sign into (i.e. Indiana University). Then sign in via their software.
- Do this for each of your machines that you cannot access. This should solve the issue!
- Transferring files to your local computer but can only see Documents folder. This is because by default, globus end point is setup to only your Documents folder. To add other folders, like your hard drive, Google drive, or Box, follow these steps,
If you have any questions mail us at email@example.com