Total PDF pages in subfolders across folder structure

Datetime:2016-08-23 01:00:20          Topic: AWK           Share

Last week, Iwrote a script that ran through a folder structure and output the page count of every PDF in all folders and sub-folders, and also spit out a grand total.

While this worked well, what I really wanted was a script that just totaled PDF pages by sub-folder, without seeing all the file-by-file detail. After trying to retrofit the first script, I realized that was a waste of time, and started over from scratch.

The resulting script works just as I’d like it to, traversing a folder structure and showing PDF page counts by folder:

$ countpdfbydir
47: ./_Legal
2: ./_Medical-Dental
15: ./_Medical-Dental/Kids
11: ./_Medical-Dental/Marian
2: ./_Medical-Dental/Rob
35: ./_Personal Documents/Kids
87: ./_Personal Documents/Marian
28: ./_Personal Documents/Rob
10: ./_Personal Documents/Rob/Golf
12: ./_Personal Documents/Rob/Travel
-------------------------------------------------------------------
249: Total PDF Pages

It took a few revisions, but I like this version; it even does some simplistic padding to keep the figures lined up in the output.

Here’s what I came up with:

Subtotal PDF page counts by subfolder

Shell

#!/bin/bash

saveIFS=$IFS
IFS=$(echo -en "\n\b")

baseDir=`pwd`
myDirs=($(find . -mindepth 0 -maxdepth 999 -type d))
myDirCount=${#myDirs[*]}

grandtotalPages=0

i=0
while [ $i -lt $myDirCount ]; do
	cd ${myDirs[$i]}
	
	myFiles=($(find . -maxdepth 1 -name "*.pdf"))
	myFileCount=${#myFiles[*]}
	subtotalPages=0
	
	# We have PDFs in this dir, so loop through and count pages
	if [ $myFileCount -ne 0 ]; then
		j=0
		while [ $j -lt $myFileCount ]; do
			pageCount=$(mdls ${myFiles[j]} | grep kMDItemNumberOfPages | awk -F'= ' '{print $2}')
			size=${#pageCount}
  			if [ $size -eq 0 ]
  			then
  				# This PDF is missing a page count, so we skip it
    			# echo ${myFiles[j]} : \*\* Skipped - no page count \*\*
    			echo ""
			else
				# Increment a subtotal by directory and a running grand total
    			subtotalPages=$(($subtotalPages + $pageCount)) 
    			grandtotalPages=$((grandtotalPages + $pageCount))
  			fi
  			j=$(( $j + 1 ))
  		done
 
  		# Pad the results for nice alignment of page counts
  		digitCount=${#subtotalPages}
 		case $digitCount in
  			1)
  				padding="    ";;
  			2)
  				padding="   ";;
  			3)
  				padding="  ";;
  			4)
  				padding=" ";;
  			*) ;;
 		esac
  		
  		echo "$padding$subtotalPages: ${myDirs[i]}"
  	fi
  	
	i=$(( $i + 1 ))
	cd $baseDir
done

		# Pad the results for nice alignment of grand total
  		digitCount=${#grandtotalPages}
 		case $digitCount in
 			1)
 				padding="    ";;
 			2)
 				padding="   ";;
 			3)
 				padding="  ";;
  			4)
  				padding=" ";;
  			*) ;;
 		esac

echo "-------------------------------------------------------------------"
echo "$padding$grandtotalPages: Total PDF Pages"
  		
IFS=$saveIFS
#!/bin/bash
saveIFS=$IFS
IFS=$(echo-en"\n\b")
baseDir=`pwd`
myDirs=($(find.-mindepth0-maxdepth999-typed))
myDirCount=${#myDirs[*]}
grandtotalPages=0
i=0
while[$i-lt$myDirCount];do
cd${myDirs[$i]}
myFiles=($(find.-maxdepth1-name"*.pdf"))
myFileCount=${#myFiles[*]}
subtotalPages=0
# We have PDFs in this dir, so loop through and count pages
if[$myFileCount-ne0];then
j=0
while[$j-lt$myFileCount];do
pageCount=$(mdls${myFiles[j]}|grepkMDItemNumberOfPages|awk-F'= ''{print $2}')
size=${#pageCount}
	if[$size-eq0]
	then
	# This PDF is missing a page count, so we skip it
	  # echo ${myFiles[j]} : \*\* Skipped - no page count \*\*
	  echo""
else
# Increment a subtotal by directory and a running grand total
	  subtotalPages=$(($subtotalPages+$pageCount))
	  grandtotalPages=$((grandtotalPages+$pageCount))
	fi
	j=$(($j+1))
	done
	# Pad the results for nice alignment of page counts
	digitCount=${#subtotalPages}
case$digitCountin
	1)
	padding="	 ";;
	2)
	padding="	";;
	3)
	padding="  ";;
	4)
	padding=" ";;
	*);;
esac
	echo"$padding$subtotalPages: ${myDirs[i]}"
	fi
i=$(($i+1))
cd$baseDir
done
# Pad the results for nice alignment of grand total
	digitCount=${#grandtotalPages}
case$digitCountin
1)
padding="	 ";;
2)
padding="	";;
3)
padding="  ";;
	4)
	padding=" ";;
	*);;
esac
echo"-------------------------------------------------------------------"
echo"$padding$grandtotalPages: Total PDF Pages"
IFS=$saveIFS

I feared this would be incredibly slow, but it only took about 40 seconds to traverse a folder structure with about a gigabyte of PDFs in about 1,500 files spread across 160 subfolders, and totalling 5,306 PDF pages.

Once I had this version working, I repurposed the original script to output file-level PDF page counts only for the current directory, so I can use that one when I want the details:

$ cd Home\ Stuff
$ pdfcountbyfile
2: 2015-03-27 - Lowes.pdf
4: 2015-07-14 - Home Depot.pdf
1: 2015-09-03 - Home Depot.pdf
-----------------------------------------------------------------
7: Total PDF pages in this folder

In case you want it, here’s the modified script that generates the file-level PDF page counts:

Count PDF pages by file in current folder

Shell

#!/bin/bash

saveIFS=$IFS
IFS=$(echo -en "\n\b")

myFiles=($(find . -maxdepth 1 -name "*.pdf"))
myFileCount=${#myFiles[*]}
totalPages=0
i=0

while [ $i -lt $myFileCount ]
do
	prettyName=$(echo ${myFiles[i]}|cut -c 3-999)
	pageCount=$(mdls ${myFiles[i]} | grep kMDItemNumberOfPages | awk -F'= ' '{print $2}')
	size=${#pageCount}
	if [ $size -eq 0 ]
	then
		echo $prettyName : \*\* Skipped - no page count \*\*
  else
 	# Pad the results for nice alignment of page counts
  	digitCount=${#pageCount}
 	case $digitCount in
  		1)
  			padding="    ";;
  		2)
  			padding="   ";;
  		3)
  			padding="  ";;
  		4)
  			padding=" ";;
  		*) ;;
 	esac
	echo "$padding$pageCount: $prettyName"
	
    totalPages=$(($totalPages + $pageCount))  
  fi
  
  i=$(( $i + 1 ))
  
done

# Pad the results for nice alignment of grand total
digitCount=${#totalPages}
case $digitCount in
	1)
 		padding="    ";;
 	2)
 		padding="   ";;
 	3)
 		padding="  ";;
  	4)
  		padding=" ";;
  	*) ;;
esac

echo "-----------------------------------------------------------------"
echo "$padding$totalPages: Total PDF pages in this folder"

IFS=$saveIFS
#!/bin/bash
saveIFS=$IFS
IFS=$(echo-en"\n\b")
myFiles=($(find.-maxdepth1-name"*.pdf"))
myFileCount=${#myFiles[*]}
totalPages=0
i=0
while[$i-lt$myFileCount]
do
prettyName=$(echo${myFiles[i]}|cut-c3-999)
pageCount=$(mdls${myFiles[i]}|grepkMDItemNumberOfPages|awk-F'= ''{print $2}')
size=${#pageCount}
if[$size-eq0]
then
echo$prettyName:\*\*Skipped-nopagecount\*\*
  else
# Pad the results for nice alignment of page counts
	digitCount=${#pageCount}
case$digitCountin
	1)
	padding="	 ";;
	2)
	padding="	";;
	3)
	padding="  ";;
	4)
	padding=" ";;
	*);;
esac
echo"$padding$pageCount: $prettyName"
	 totalPages=$(($totalPages+$pageCount))  
  fi
  i=$(($i+1))
done
# Pad the results for nice alignment of grand total
digitCount=${#totalPages}
case$digitCountin
1)
padding="	 ";;
2)
padding="	";;
3)
padding="  ";;
	4)
	padding=" ";;
	*);;
esac
echo"-----------------------------------------------------------------"
echo"$padding$totalPages: Total PDF pages in this folder"
IFS=$saveIFS

These are clearly not need-every-day scripts, but I like the information they provide (because I’m a data geek), and they were fun for my shell-scripting-challenged brain to figure out. I’m 99.9% positive the efficiency could be improved by a factor of 100, but this works well enough for my needs.





About List