Parallel download of files using curl

In a previous blog, I showed how to download files using wget. The interesting part of this blog was to pass the authentication cookies to the server as well as using the file name given by the Content-Disposition directive when saving the file. The example of the previous blog was to download a single file. What if you want to download several files from a server? Maybe hundreds or even thousands of files? wget is not able to read the location from a file and download these in parallel, neither is curl capable of doing so. You can start the download as a sequence, letting wget/curl download the files one by one, as shown in my other blog. Just use a FOR loop until you reach the end.

Commands

For downloading a large amount of files in parallel, you`ll have to start the download command several times in parallel. To achieve this, several programs in bash must be combined.

Create the list of files to download. This is the same as shown in my previous blog.

for i in 1 {1..100}; do `printf "echo https://server.fqdn/path/to/files/%0*d/E" 7 $i` >> urls.txt; done

Start the parallel download of files. Start 10 threads of curl in background. This is an enhanced version of the curl download command of my previous blog. Xargs is used to run several instances of curl.

nohup cat urls.txt | xargs -P 10 -n 1 curl -O -J -H "$(cat headers.txt)" >nohup.out 2>&1 &

Explanation

The first command is creating a list of files to download and stores them in the file urls.txt.

The second command is more complex. First, cat is printing the content of urls.txt to standard-out. Then, xargs is reading from standard-in and uses it as input for the curl command. For authentication and other headers, the content of the file headers.txt is used. The input for curl for the first line is then:

curl -O -J -H "$(cat headers.txt)" https://server.fqdn/path/to/files/0000001/E

The parameter –P 10 informs xargs to run the command 10 times in parallel. It takes the first 10 lines of input and starts for each input a new curl process. Therefore, 10 processes of curl are running in parallel. To run more downloads in parallel, give a higher value for –P, like 20, or 40.

To run the download in background, nohup is used. All output is redirected to nohup.out: >nohup.out 2>&1

SSH

To have the download running while being logged on via SSH, the tool screen should be used. After logon via ssh, call screen, run the above command, and hit CTRL + A + D to exit screen.

ssh user@server.fqdn
screen
nohup cat urls.txt | xargs -P 10 -n 1 curl -O -J -H "$(cat headers.txt)" >nohup.out 2>&1 &
CTRL+A+D
exit
Let the world know

Operate GateOne behind Apache reverse proxy

After installing GateOne on my Raspberry Pi 2 Debian system, I can log on to SSH via HTTP and browser. But only from my internal network, as the external accessible port is blocked by Apache. To add GateOne from outside, I either can disable Apache (no, won`t do it) or make GateOne accessible through Apache. I`ll use Apache as a reverse proxy for this.

Note: you`ll need Apache 2.4 for this, as GateOne uses Websocket for communication, and this is included only ootb with Apache 2.4.

Configuration

GateOne

I won`t use a subdomain for this example, so no new Apache virtual host will be use. This means that I have to use a URL prefix to access GateOne. The prefix is: /ssh. This must be configured in the GateOne configuration file:

sudo vim /opt/gateone/server.conf

Change the parameter url_prefix and restart GateOne

url_prefix = "/ssh"

To be able to access GateOne from external, the URL of the external server must be added to origin. In my case, this means that www.itsfullofstars.de is added.

origins = "https://www.itsfullofstars.de:8081;http://127.0.0.1;http://localhost

Apache

To make use of Apache as a reverse proxy, first the modules must be enabled. You can do this with a2enmod. Add also the web socket module

sudo a2enmod proxy_wstunnel
sudo a2enmod proxy_http 

Edit the apache configuration to add a reverse proxy for /ssh. Do this for HTTP and WS. In case GateOne listens on TLS, do this for HTTPS and WSS.

ProxyPass /ssh ws://127.0.0.1:9080/ssh
ProxyPassReverse /ssh ws://127.0.0.1:9080/ssh
ProxyPass /ssh http://127.0.0.1:9080/ssh
ProxyPassReverse /ssh http://127.0.0.1:9080/ssh 

Restart Apache.

sudo service apache2 restart

Now its possible to access GateOne through /ssh from external.

Let the world know

Install GateOne for Debian on Raspberry Pi 2

Running your own home server is nice, especially when it`s a Raspberry Pi and the power consumption is very low (hint: your light bulb consumes more). When you run your own server, from time to time you`ll have to access your server remotely. From inside your home network this is not a problem, but how about remote access? SSH is the preferred solution, but you need to have a port open, in and out. So when you are at a location where SSH is not allowed, you won`t be able to connect, and running your SSH server on port 80 or 443 isn`t always a solution:

  • Your web server might be running there or
  • The proxy you have to pass through will find it strange to see non HTML requests being made to that port

You might consider a remote desktop solution that allows you to connect to a terminal, but why not making use of a solution that exposes SSH server over HTTP? Say hello to GateOne. To know more what GateOne is check out their web site and GitHub repository

The code is hosted on GitHub, which means that you can use it for your own home use without having to pay for it.

As always, to get it started you must first prepare your system, install and then configure the software.

Preparations

Downloads

GateOne is available as several formats, and one is DEB. That`s nice, as my Raspberry Pi is running on Debian. To see the available downloads:

A pre-requisite is the tornado framework. So make sure you have python installed. Then download the deb file of tornado from their GitHub site.

Download tornado using wget:

wget https://github.com/downloads/liftoff/GateOne/python-tornado_2.4-1_all.deb

Download GateOne debian package, also via wget:

wget https://github.com/downloads/liftoff/GateOne/gateone_1.1-1_all.deb

Prepare packages

Besides the downloaded packages, you`ll need python and python-support.

Python

sudo apt-get install python

Python-support

sudo apt-get install python-support

Installation

Python tornado

sudo dpkg -i python-tornado_2.4-1_all.deb

GateOne

sudo dpkg -i gateone_1.1-1_all.deb

Configuration

The above dpkg command installed GateOne in the folder /opt/gateone. When started, GateOne reads its configuration from a file named server.conf. This file is only created after GateOne was run at least once (or you copy another version into the directory). Next step therefore is to run GateOne and then stop it to be able to alter the default configuration. Run GateOne to let it create the configuration file:

sudo ./gateone.py

End the program (ctrl-c). As a result, server.conf will be now available.

Parameters

I`ll run GateOne behind a proxy that will do the SSL stuff, so I can disable ssl

disable_ssl = True

On port 443 my proxy is running, so I must change the port GateOne is going to use.

port = 9080

To make connections to this port, add it to origins

origins = “https://www.itsfullofstars.de:8081;http://localhost;http://127.0.0.1”

This is the basic GateOne configuration. My reverse proxy will handle the TLS part, so I did not have to configure GateOne for this. Of course, best practice is to also make sure GateOne only accepts TLS secured connections. After all, I`ll transmit a password. But the proxy and GateOne run on the same host, and I`ll use GateOne only for external access. I think in this special case I can ignore the additional security.

Run GateOne.

Access it via HTTP inside a browser.

See the log output given in the console.

GateOne as a service

Of course you do not want to run GateOne manually to be able to use it later. You want to have it run at system startup as a service. The GitHub site of GateOne contains a install readme which covers this too: https://github.com/liftoff/GateOne/blob/master/INSTALL.txt

sudo update-rc.d gateone defaults

Start the service

sudo service gateone start

Check that GateOne is running

ps -ef | grep gateone

Let the world know