Parallel download of files using curl

In a previous blog, I showed how to download files using wget. The interesting part of this blog was to pass the authentication cookies to the server as well as using the file name given by the Content-Disposition directive when saving the file. The example of the previous blog was to download a single file. What if you want to download several files from a server? Maybe hundreds or even thousands of files? wget is not able to read the location from a file and download these in parallel, neither is curl capable of doing so. You can start the download as a sequence, letting wget/curl download the files one by one, as shown in my other blog. Just use a FOR loop until you reach the end.

Commands

For downloading a large amount of files in parallel, you`ll have to start the download command several times in parallel. To achieve this, several programs in bash must be combined.

Create the list of files to download. This is the same as shown in my previous blog.

for i in 1 {1..100}; do `printf "echo https://server.fqdn/path/to/files/%0*d/E" 7 $i` >> urls.txt; done

Start the parallel download of files. Start 10 threads of curl in background. This is an enhanced version of the curl download command of my previous blog. Xargs is used to run several instances of curl.

nohup cat urls.txt | xargs -P 10 -n 1 curl -O -J -H "$(cat headers.txt)" >nohup.out 2>&1 &

Explanation

The first command is creating a list of files to download and stores them in the file urls.txt.

The second command is more complex. First, cat is printing the content of urls.txt to standard-out. Then, xargs is reading from standard-in and uses it as input for the curl command. For authentication and other headers, the content of the file headers.txt is used. The input for curl for the first line is then:

curl -O -J -H "$(cat headers.txt)" https://server.fqdn/path/to/files/0000001/E

The parameter –P 10 informs xargs to run the command 10 times in parallel. It takes the first 10 lines of input and starts for each input a new curl process. Therefore, 10 processes of curl are running in parallel. To run more downloads in parallel, give a higher value for –P, like 20, or 40.

To run the download in background, nohup is used. All output is redirected to nohup.out: >nohup.out 2>&1

SSH

To have the download running while being logged on via SSH, the tool screen should be used. After logon via ssh, call screen, run the above command, and hit CTRL + A + D to exit screen.

ssh user@server.fqdn
screen
nohup cat urls.txt | xargs -P 10 -n 1 curl -O -J -H "$(cat headers.txt)" >nohup.out 2>&1 &
CTRL+A+D
exit

Download a HDS VOD stream

Video streaming is nice, but you need to be online to watch most of the videos made available. If you want to assist them offline or archive them for later reference, you depend on whether a downloadable version is available … or you capture the video. There are a lot of tools available to do this, but it is not really easy for HDS videos made available via on demand. But there is a PHP script available that assists you in retrieving these kind of videos for offline consumption.

Basically, you start the video and look in the network trace for manifest. Copy the URL as this is the URL going to be used by the php script. There are two parameters you`ll need:

  • g
  • hdcore

Any other parameter can be deleted. In Google Chrome, you can start the video and in the network tab you`ll be able to see and copy the manifest.f4m URL.

Sample URL:

http://server.akamaihd.net/z/path/vod/abc1111_,200,600,1200,.mp4.csmil/manifest.f4m?g=ABCDEFGH&hdcore=3.3.0

Paste this URL as the manifest parameter of the php script and run it.

Command: php AdobeHDS.php –manifest “http://server.akamaihd.net/z/path/vod/abc1111_,200,600,1200,.mp4.csmil/manifest.f4m?g=ABCDEFGH&hdcore=3.3.0” –delete

This will start the download of the files in 8 threads and after all are downloaded, make them available as a flv file.

Downloading fragments

Result

Download Oracle Java via wget

If you want or have to download Java from Oracle’s web site, you might know that you have to accept the “Oracle Binary Code License Agreement for Java SE” to activate the download link. If you have to download the binary from a computer without a browser, you get some problems: how to click on something that needs to accessed by a browser? What happens when you click on the link (technically) is that a cookie is being set. The download site checks for that cookie and when it is set, allows you to download the binary.

With knowing that, you can use wget to download Java without having to actually click on the checkbox. Just send the cookie with wget. The command for downloading Java SE 8 u51 with wget is:

Command: wget –header “Cookie: oraclelicense=accept-securebackup-cookie” http://download.oracle.com/otn-pub/java/jdk/8u51-b16/jdk-8u51-linux-x64.tar.gz

Of course, you still have to accept Oracles license agreement.