Download resources from SAP Cloud for your CI job

When running a CI job you may need to use some SAP tools. For instance, the MTA builder or Neo tools. Many CI servers include integration to build tools or plugins are provided by the community or vender. Jenkins offers plugins for Maven, Ant or Node that let you easily integrate these into a CI jobs. If you have a CI job for SAP, it is your task to make the necessary tools available. There are not many plugins for SAP available for Jenkins.

Some tools you may need can be found on SAP’s tool site. For instance, the MTA builder. A simple JAR file that is available for download and needed in case you are working with MTA apps.

Before you can download the JAR file, you need to agree to the EUL.

This means that you cannot download the JAR using cli:

wget https://tools.hana.ondemand.com/additional/mta_archive_builder-1.1.0.jar

Solution

Running the above wget command will not download the tool, but a web site. Some may know that this is very close to how Oracle protected it’s Java download. And the “solution” here is the same: send the right cookie via wget.

wget --header "Cookie: eula_3_1_agreed=tools.hana.ondemand.com/developer-license-3_1.txt" https://tools.hana.ondemand.com/additional/mta_archive_builder-1.1.0.jar

Works for downloading other tools from the download page like the Neo SDK too:

wget --header "Cookie: eula_3_1_agreed=tools.hana.ondemand.com/developer-license-3_1.txt" https://tools.hana.ondemand.com/sdk/neo-javaee6-wp-sdk-2.137.0.1.zip

Let’s hope SAP provides some Jenkins plugins that take care of downloading these automatically.

Parallel download of files using curl

In a previous blog, I showed how to download files using wget. The interesting part of this blog was to pass the authentication cookies to the server as well as using the file name given by the Content-Disposition directive when saving the file. The example of the previous blog was to download a single file. What if you want to download several files from a server? Maybe hundreds or even thousands of files? wget is not able to read the location from a file and download these in parallel, neither is curl capable of doing so. You can start the download as a sequence, letting wget/curl download the files one by one, as shown in my other blog. Just use a FOR loop until you reach the end.

Commands

For downloading a large amount of files in parallel, you`ll have to start the download command several times in parallel. To achieve this, several programs in bash must be combined.

Create the list of files to download. This is the same as shown in my previous blog.

for i in 1 {1..100}; do `printf "echo https://server.fqdn/path/to/files/%0*d/E" 7 $i` >> urls.txt; done

Start the parallel download of files. Start 10 threads of curl in background. This is an enhanced version of the curl download command of my previous blog. Xargs is used to run several instances of curl.

nohup cat urls.txt | xargs -P 10 -n 1 curl -O -J -H "$(cat headers.txt)" >nohup.out 2>&1 &

Explanation

The first command is creating a list of files to download and stores them in the file urls.txt.

The second command is more complex. First, cat is printing the content of urls.txt to standard-out. Then, xargs is reading from standard-in and uses it as input for the curl command. For authentication and other headers, the content of the file headers.txt is used. The input for curl for the first line is then:

curl -O -J -H "$(cat headers.txt)" https://server.fqdn/path/to/files/0000001/E

The parameter –P 10 informs xargs to run the command 10 times in parallel. It takes the first 10 lines of input and starts for each input a new curl process. Therefore, 10 processes of curl are running in parallel. To run more downloads in parallel, give a higher value for –P, like 20, or 40.

To run the download in background, nohup is used. All output is redirected to nohup.out: >nohup.out 2>&1

SSH

To have the download running while being logged on via SSH, the tool screen should be used. After logon via ssh, call screen, run the above command, and hit CTRL + A + D to exit screen.

ssh user@server.fqdn
screen
nohup cat urls.txt | xargs -P 10 -n 1 curl -O -J -H "$(cat headers.txt)" >nohup.out 2>&1 &
CTRL+A+D
exit

Download files with leading zero in name using wget

In my previous blog I showed how wget can be used to download a file from a server using HTTP headers for authentication and how to use Content-Disposition directive send by the server to determine the correct file name. With the information of the blog it`s possible to download a single file from a server. But what if you must download several files? Maybe hundreds or thousands of files? Files whose file name is created using a mask, adding leading zeros?

Add leading zeros

What you need is a list of files to download. I`ll follow my example from the previous post and my files follow a specific patter: number. All files are numbered from 1 to n. To make it more special / complicated, it`s not only 1 to n. A mask is applied: 7 digits in total, with leading 0. 123 is 0000123, or 5301 is 0005301. In recent versions of Bash, you can use a FOR loop to loop through the numbers and printf for formatting the output and add the leading zeros. To get the numbers correctly formatted, the command is:

for i in 140000 {140001..140005}; 
  do echo `printf "%0*d" 7 $i`; 
done

This prints (echo) the numbers 140000 to 140005 with leading zero.

Start download

Adding the wget command in the printf directive allows to download the files. The execution flow is to let the FOR loop together with printf create the right number with mask, and wget downloads the file. After the file is download, the next iteration of the FOR loop starts, and the next file is downloaded. Assuming that I have PDF documents named 0140000.pdf to 0140005.pdf on server http://localhost:9080, the FOR loop with wget is:

for i in 140000 {140001..140005}; 
  do `printf "wget -nc --content-disposition http://localhost:9080/%0*d.pdf\n" 7 $i`; 
done

Result

Alternative

The above example is using wget. Of course, you can do the same using curl.

Download files with wget

A tool for download web resources is wget. It comes with a feature to mirror web sites, but you can also use it to download specific files, like PDFs. This is very easy and straightforward to do:

wget <url>
Example: wget http://localhost/doc.pdf

This will instruct wget to download the file doc.pdf from localhost and save it as doc.pdf. It is not as easy when the weber service is

  • requesting authentication or
  • the URL of the PDF file ends in the same file name

Authentication

The documentation of wget states that you can provide the username and password for BASIC authentication. What about a web site that asks for SAML 2.0? You can pass HTTP headers to wget via parameter –header. This feature makes it easy: log on to the server via a browser and then copy the headers. These headers contain the session information of you user and can be used by wget to connect as an authenticated user.

How to get the HTTP headers

  1. Log on to the web site
  2. Open developer tools
  3. Select a web resource
  4. Copy the HTTP headers. For cURL, its just selecting Copy all as cURL. This gives the complete cURL command. For just the headers, select Copy Request Headers.

Example:

User-Agent: Mozilla/5.0 Chrome/56
Accept-Encoding: gzip, deflate, sdch, br
Cookie: JSESSIONID=DBE1FED5C040B2DF7;

Each line is one –header parameter for wget. It is not feasible to add all these headers to each wget request individually. For maintenance and better readability these values should be read from a file. Problem: wget does not allow to read the header parameter from a file. There is no option for something like –header <file_with_headers>. What there is the . wgetrc file. This is the configuration file wget reads when called, and in this file it`s possible to define HTTP header values. For each HTTP header, create a new “header = <value>” entry in the file.

With this configured in the file, wget will send always these HTTP headers with each request. If the session cookies copied from the browser are valid the requests are authenticated and wget is able to download the file.

File name

Sometimes the file you want to download has a generic URL. Each file ends in the same file name at the server. For instance, http://localhost/category/doc.pdf, or /uid/E.pdf. In such cases, wget will download the file and save it as doc.pdf or E.pdf. This is not a problem when you download just one file, but when you download more files, like 20, wget numerate the files: E.pdf.1, E.pdf.2, E.pdf.3, …

This makes it hard to work with the files. A solution can be to check if the web server is supporting content-disposition. If so, the server should send the real file name of the archive in the HTTP response. The real file name can be seen in the Conent-Disposition header as filename.

With content-diposition, wget can save the downloaded file from /<UID>/E.pdf as <UID>.pdf instead of E.pdf. As the UID is unique, the file can easily be identified after download.

wget --content-disposition http://localhost/<uid>/E.pdf

Given the above example, the file download is saved locally as 2399104_E_20170304.pdf

Download Oracle Java via wget

If you want or have to download Java from Oracle’s web site, you might know that you have to accept the “Oracle Binary Code License Agreement for Java SE” to activate the download link. If you have to download the binary from a computer without a browser, you get some problems: how to click on something that needs to accessed by a browser? What happens when you click on the link (technically) is that a cookie is being set. The download site checks for that cookie and when it is set, allows you to download the binary.

With knowing that, you can use wget to download Java without having to actually click on the checkbox. Just send the cookie with wget. The command for downloading Java SE 8 u51 with wget is:

Command: wget –header “Cookie: oraclelicense=accept-securebackup-cookie” http://download.oracle.com/otn-pub/java/jdk/8u51-b16/jdk-8u51-linux-x64.tar.gz

Of course, you still have to accept Oracles license agreement.