podcast bandwidth conservation

Podcasts are awesome

I don’t own a TV now and I don’t plan to own one during my boondocking adventure.  I have largely given up on commercial radio.  For those and several more reasons over the past 10 years **podcasts have become my main source of information/entertainment.  **eBooks do the heavy lifting for more complex or non-nearrealtime subjects.

Podcast files can be big

Audio is much more miserly with bandwidth than video but it can still chew up available resources. Particularly when podcasters over-produce the audio with needlessly high sampling/encoding rates, stereo when there is just one input, etc.  There is a podcast out there that regularly puts out 900MB (!) audio-only shows.  Then they whine about bandwidth costs and beg for money.    Trying… hard… not to…. rant…..

This is how I minimize the impact of podcast downloads on my own and on shared bandwith:

  1. subscribe only to podcasts I care ab0ut.  This takes some discipline;  in the past I had a tendency to see free stuff and immediatly think “why not?”.  That can lead to digital hoarding.

  2. download only those specific episodes of the podcast that directly interest me.  No autodownloading.  This is also useful when the podcaster dorks up the feed and everyone re-downloads 30GB of eps or whatever. More discipline…

  3. pre-process the podcast files before I pull them down my limited bandwidth, making sure I don’t DL unnecessary data.  This step allows for some neat efficiencies, like changing out the water pump when you do the timing belt.  You’re already in there!

pre-processing

The catch-22 here is that to downsize the podcast file you have to download it, but you don’t want to download it until it’s downsized.  The solution is to download it somewhere else first, somewhere the bandwidth is not as restricted compared to your boondocking paradise.  Do the processing there then download to your location.

You can buy a virtual server for cheap these days.  The cheapest ones are about $15/yr for a linux virtual with minimal RAM and diskspace.  Those are fine for our usage, and they can also do double duty for other services you might want to run.  Poor man’s VPN, proxy, webserver, cloud backup, whatever.  I currently have two virtuals from BuyVM, who I heartily endorse and feel great supporting with my dollars.   I have a $15/yr cheapo for doing this kind of processing, RSS handling, etc.  Since the TOS prohibits webservers on this cheapie I also have a $30/yr virtual I run webservers, databases, and other stuff on.  Eventually I will combine them into one.

Anyhow.

I’ll give you a peek into my rough (but functional) approach.

###

Getting the podcasts to the processing location

If hpodder was still in the debian repository (or buildable by mortals) I would be using that.  It had a great CLI that would let you update / catch up / queue shows altogether or separately.  The closest thing I could find to this functionality is the rather ungainly newsbeuter.  I don’t like it but it’s the best I’ve found for what I am doing.

Since it requires a chain of actions I call it with an alias:

alias catchpod='newsbeuter -r; podbeuter; processPodcasts.sh'

The shell script at the end does the processing.

Processing

I am not a programmer so this will be ugly.  We are going to loop through all the *.mp3 files and work with them one at a time as $MP3.

First I rename the files in case they are weird or messed up.  This kind of thing can be done in a hellacious one-liner but my lizard brain needs legibility more than efficiency:

# remove spaces
rename 's/ //g' *

# NPR puts weird chars in their filenames. This can complicate
# the processing later
# Note:  the '=' is ok and helps us relocate NPR files in the next step
# for now we will remove question marks, ampersands, and percent signs.
rename 's/[?&%]//g' *.mp3\?*

# now we can tidy up
for LONG in *=*
        do
        mv -v $LONG NPR-$RANDOM.mp3
        done

Learn the file’s name without the extension.  This can help keep the lizard brain straight when working with variations of the filename.

BASENAME=`basename "$MP3" .mp3`

Decode the .mp3 to .wav for processing:

nice -19 lame --decode $MP3

VOX the file (i.e. remove silence).  Here I am trimming any silence longer than 1.5 seconds down to 1.5 seconds.  Any non-silence sound <= .2 seconds in duration is assumed to be non-informative noise and would not reset the 1.5 second counter.

sox $BASENAME.wav temp.wav silence -l 1 0.2 1% -1 1.5 1%
mv temp.wav $BASENAME.wav

This doesn’t always save much but why pay to move silence?  In this example VOXing reduced the file by about 2.5%:

313005544 Sep 24 07:22 KCRW-left_right_center-...wav
305279260 Sep 24 07:22 temp.wav

Normalize the audio level.  It bugs me when I have to adjust the volume between files.

I’ve got mine set pretty high here for now since I listen to podcasts on the motorcycle with earplugs in:

normalize-audio \
                -a -5dBFS \
                -v \
                ${BASENAME}.wav

Re-encode at voice quality.

lame --preset voice ${BASENAME}.wav -o ${BASENAME}-processed.mp3

You wouldn’t want to use the voice preset for music but it works great for podcasts.  In general the resulting encode is half the size of the original.  Here’s the comparison for the VOXed example from before:

28414721 Sep 24 07:20 KCRW-left_right_center.mp3
12575988 Sep 24 07:24 KCRW-left_right_center-processed.mp3

56% reduction is filesize after processing.  Woot!

Updated: