You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried writing an apple action to transcribe my videos. It's working great, except slow. It's using a lot of CPU and zero GPU. I am coding with ChatGPT, so things are a little challenging. I was wondering if there was a way to modify this current code in order to trigger GPU processing? whisper cpp was suggested, but ChatGPT doesn't know how to give me instructions for a good setup and doesn't understand the current API of it. I figure there's probably a whisper direct way to do it in 2025?
here's the code i'm running in Automator for mac...
#!/bin/bash
###############################################################################
# Prepend Homebrew's bin directory so ffmpeg, ffprobe, and terminal-notifier
# are available
###############################################################################
export PATH="/opt/homebrew/bin:$PATH"
# Full path to the Whisper executable.
WHISPER_PATH="/Users/scotnery/Library/Python/3.9/bin/whisper"
###############################################################################
# Function: Open media in QuickTime, loop until ffprobe can read it,
# then quit QuickTime automatically.
###############################################################################
force_download_via_quicktime() {
local FILE="$1"
# Launch QuickTime with the file (asynchronously)
open -a "QuickTime Player" "$FILE"
local ATTEMPTS=20 # How many times to try ffprobe
local DELAY=5 # Seconds between attempts
for (( i=1; i<=ATTEMPTS; i++ )); do
echo "Checking if file is fully available (attempt $i/$ATTEMPTS)..." >&2
# If ffprobe can read container/duration, the file should be fully downloaded
if ffprobe -v error -show_entries format=duration -of csv=p=0 "$FILE" >/dev/null 2>&1; then
echo "File is fully readable. Quitting QuickTime..." >&2
osascript -e 'tell application "QuickTime Player" to quit'
return 0
else
echo "File not ready. Waiting $DELAY seconds..." >&2
sleep $DELAY
fi
done
# Timed out: couldn't confirm local availability
echo "Timeout: could not confirm download after $ATTEMPTS attempts." >&2
osascript -e 'tell application "QuickTime Player" to quit'
return 1
}
###############################################################################
# Main loop: process each file passed to this script
###############################################################################
for FILE in "$@"; do
# Extract file extension, parent directory, and parent folder name.
EXT="${FILE##*.}"
PARENT_DIR="$(dirname "$FILE")"
PARENT_NAME="$(basename "$PARENT_DIR")"
# If folder name starts with YYYY-MM-DD, remove that part for a cleaner name.
if [[ "$PARENT_NAME" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2} ]]; then
CLEAN_FOLDER="$(echo "$PARENT_NAME" | sed -E 's/^[0-9]{4}-([0-9]{2}-[0-9]{2}) [0-9]{2}\.[0-9]{2}\.[0-9]{2} (.*)$/\1 \2/')"
else
CLEAN_FOLDER="$PARENT_NAME"
fi
# Define the expected transcript file (Whisper saves .txt with the same basename).
OUTPUT_FILE="$PARENT_DIR/$(basename "$FILE" ."$EXT").txt"
# Temporary log file for capturing Whisper output/errors.
LOG_FILE="/tmp/whisper_log_$(basename "$FILE").txt"
# Only process certain media extensions
if [[ "$EXT" =~ ^(mp4|mov|mkv|mp3|wav|flac)$ ]]; then
# 1. Show ephemeral "Downloading file..." notification
osascript -e "display notification \"$(basename "$FILE")/${CLEAN_FOLDER}\" \
with title \"Whisper AI Transcription 📥 Opening QuickTime...\""
########################################################################
# 2. Attempt to force local download via QuickTime
########################################################################
if ! force_download_via_quicktime "$FILE"; then
osascript -e "display notification \"Skipping $(basename "$FILE")\" with title \"Not downloaded\""
echo "Skipping $FILE because it never became fully available." >&2
continue
fi
########################################################################
# 3. Now that the file is presumably local, gather info & show "Starting..."
########################################################################
# Human-readable file size
FILE_SIZE="$(du -h "$FILE" | cut -f1)"
# File size in bytes
SIZE_BYTES="$(stat -f%z "$FILE")"
# (Optional) Estimated processing time
EST_SEC="$(awk -v size="$SIZE_BYTES" 'BEGIN { printf "%.0f", 0.01657*sqrt(size) }')"
CURRENT_TIME="$(date +%s)"
ESTIMATED_END_TIME=$((CURRENT_TIME + EST_SEC))
ESTIMATED_TIME="$(date -r "$ESTIMATED_END_TIME" +"%I:%M%p")"
# Media duration via ffprobe
FILE_DURATION_RAW="$(ffprobe -v error -show_entries format=duration \
-of default=noprint_wrappers=1:nokey=1 "$FILE")"
FILE_DURATION_SEC="$(printf "%.0f" "$FILE_DURATION_RAW")"
DURATION_HOURS=$(( FILE_DURATION_SEC / 3600 ))
DURATION_MINUTES=$(( (FILE_DURATION_SEC % 3600) / 60 ))
DURATION_SECONDS=$(( FILE_DURATION_SEC % 60 ))
FILE_DURATION_DISPLAY="$(printf "%02d:%02d:%02d" $DURATION_HOURS $DURATION_MINUTES $DURATION_SECONDS)"
# Show ephemeral "Transcription started..."
osascript -e "display notification \"🎬$FILE_DURATION_DISPLAY 🧠$FILE_SIZE\n$(basename "$FILE")/${CLEAN_FOLDER}\" \
with title \"Whisper AI Transcription 🐝 Starting...\""
echo "Processing: $FILE..." >&2
START_TIME="$(date +%s)"
# 4. Run Whisper
"$WHISPER_PATH" "$FILE" \
--model small \
--language en \
--output_format txt \
--output_dir "$PARENT_DIR" \
> "$LOG_FILE" 2>&1
EXIT_STATUS=$?
END_TIME="$(date +%s)"
ACTUAL_DURATION=$(( END_TIME - START_TIME ))
HOURS=$(( ACTUAL_DURATION / 3600 ))
MINUTES=$((( ACTUAL_DURATION % 3600) / 60 ))
SECONDS=$(( ACTUAL_DURATION % 60 ))
FORMATTED_DURATION="$(printf "%02d:%02d:%02d" $HOURS $MINUTES $SECONDS)"
########################################################################
# 5. If the transcript file exists and is non-empty, show success;
# otherwise, show the error dialog.
########################################################################
if [[ -s "$OUTPUT_FILE" ]]; then
# Success block
URI="$(python3 -c "import urllib.parse; import sys; print('file://' + urllib.parse.quote(sys.argv[1]))" "$OUTPUT_FILE")"
terminal-notifier -title "Whisper AI Transcription ✅Done" \
-message "🎬$FILE_DURATION_DISPLAY 🧠$FILE_SIZE ⏱️$FORMATTED_DURATION
$CLEAN_FOLDER/$(basename "$FILE")" \
-open "$URI" \
-timeout 0
else
# Error block (if file is missing or empty)
ERROR_MSG="$(cat "$LOG_FILE" | tr -d '"' | tr -d "'")"
osascript -e "display dialog \"❌ Error during transcription for: $(basename "$FILE")\nFolder: $CLEAN_FOLDER\nSize: $FILE_SIZE\nDuration: $FORMATTED_DURATION\n\nError:\n$ERROR_MSG\" with title \"Whisper AI Error\" buttons {\"OK\"}"
fi
else
echo "Skipping non-media file: $FILE" >&2
fi
done
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I tried writing an apple action to transcribe my videos. It's working great, except slow. It's using a lot of CPU and zero GPU. I am coding with ChatGPT, so things are a little challenging. I was wondering if there was a way to modify this current code in order to trigger GPU processing? whisper cpp was suggested, but ChatGPT doesn't know how to give me instructions for a good setup and doesn't understand the current API of it. I figure there's probably a whisper direct way to do it in 2025?
here's the code i'm running in Automator for mac...
Beta Was this translation helpful? Give feedback.
All reactions