How to use find to sort files across folders

Published by Tobias Hofmann on

4 min read

Short version

You have files named File0.txt to File100.txt in different folders and want to move the first 30 files in a separate directory (command for Mac users. Linux users can use find and mv):

For sorting FileNN.txt (character + number)

gfind -type f -printf "%f %p\n" | sort -n -k 1.5 | sed 's/.* //' | head -30 | xargs gmv -t ./A/

For sorting NN.txt (numeric filename)

gfind -type f -printf "%f %p\n" | sort -n | sed 's/.* //' | head -30 | xargs gmv -t ./A/

Preparation

For the below commands to work, you’ll need to use GNU find. If you are using a Mac, you’ll need to install the GNU version of find and mv via homebrew.

brew install findutils coreutils

Create a test folder structure. There will be 3 folders and several files in them.

mkdir 1
mkdir 2
mkdir 3

Create 100 files with name TestNN.txt with sample content and place them in one of the three directories randomly.

for i in {000..100}
  do
    Num=$((1 + RANDOM % 3))
    echo hello > "$Num/File${i}.txt"
done

After running the above script, the folder will look like this (running ls -R)

Also create the target directory A:

mkdir A

Commands

After the initial setup is done, we have several files in 3 directories. If you use find to get a list of all files, you’ll see that the output is not sorted.

gfind ./ -type f

A Unix command to sort files is sort. Applying sort in this scenario won’t help, as the files are sorted by the folder name:

gfind ./ -type f | sort -n

The output is now sorted by folder name and then by file name, but not only by file name. Copying the first 50 elements won’t result in the File1 – File 50. The files are not distributed across the directories as needed.

It is possible to see a solution to the problem: sort only on the filename, while still having the complete path in the output for piping the parth to the copy command. Find includes exactly this possibility: print a specific field. To control the output, parameter -printf is available, and %f prints the filename, while %p includes the folder.

gfind -type f -printf "%f\n"

The output of the command only prints the filename.

To output the file with path, use %p. In both cases \n is used to have each file in a new line.

gfind -type f -printf "%p\n"

Both output parameters can be combined. %f %p\n will first print the filename, then space, then the path.

gfind -type f -printf "%f %p\n"

Applying sort on this output will sort on the file name only.

gfind -type f -printf "%f %p\n" | sort -n

Close, but not exactly how it should be. In case your filename consists only of numbers, this will already work. In the example however, the filename contains characters. Therefore, sorting is not working correctly. It starts with File0.txt, then File1.txt, but then comes File10.txt and not File2.txt. To sort by the number, add to sort an additional parameter: -k 1.5. As the filename contains a fixed value (File), the parameter will instruct sort to ignore this part when sorting and focus only on the number.

Note: you may apply the same sort parameter without using find, just ls. As long as your path has the same size, it will work. For folders named 1..9 it’s ok, but when your folder has two or more chars (like 10, or 213, or test), the parameter needs to be adjusted.

List all files with directory name using ls:

ls -d1 */*

Sort by number in filename:

ls -d1 */* | sort -n -k 1.7

gfind -type f -printf "%f %p\n" | sort -n -k 1.5

With the last command, the output is correctly sorted based on the filename. Now, how to use this output to move the files to the target directory? Just piping the output to mv won’t work. The first part with the filename is not needed, only the second part. Both parts are separated by blank, and using sed, it’s possible to eliminate the part before the blank from the output.

gfind -type f -printf "%f %p\n" | sort -n -k 1.5 | sed 's/.* //'

The last step is now to use mv to move the files to the target directory. To not have to move all files, let’s take only the first 30 files. Gnu mv is needed to move the files, as the default MacOS BSD mv does not include the -t parameter. To pass the files line by line, xargs is used together with gmv.

gfind -type f -printf "%f %p\n" | sort -n -k 1.5 | sed 's/.* //' | head -30 | xargs gmv -t ./A/

Result

Now there are the first 30 files in folder A.

gls -1v ./A

 

Let the world know
Categories: Technology

Tobias Hofmann

Doing stuff with SAP since 1998. Open, web, UX, cloud. I am not a Basis guy, but very knowledgeable about Basis stuff, as it's the foundation of everything I do (DevOps). Performance is king, and unit tests is something I actually do. Developing HTML5 apps when HTML5 wasn't around. HCP/SCP user since 2012, NetWeaver since 2002, ABAP since 1998.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.