Command-line Audio on a Mac

Once the song is finished, control will be returned to your shell (or you can press CTRL-C).

Background Audio:

If you want to continue using the shell without waiting for the process to finish, you can run afplay in the background:

$ afplay example.mp3 &
[1] 7946

In this example, 7946 is the process ID (PID) of the instance of afplay that's now running -- if you want to terminate the process before it ends you can use kill:

$ kill 7946

If you don't remember the PID, you can also do killall afplay, or:

$ jobs
[1]+  Running                 afplay example.mp3 &
$ fg 1 # then hit CTRL-C

(These utilities are part of a system known as job control).

Although afplay doesn't say it directly, I believe it supports the following file formats:

So basically anything besides OGG files and some Windows stuff? (BTW, I found this information by running afconvert -hf, so it's possible that the same does not apply to afplay).

Other than that, the only functionality it really has is playing songs at different rates (i.e. slower or faster).

Audio File Information

The estimated duration field comes in handy often. Hint: afinfo example.mp3 | grep "duration:" | cut -d' ' -f3 will get you the duration of an audio file in seconds.

Audio File Format Conversion

Honestly I've never used this one (afconvert) and it supposedly doesn't work with mp3 files. This StackExchange thread has good directions.

FFMPEG Equivalents

FFMPEG is another command-line utility capable of playing audio (and also video) -- it's huge and much more powerful than the af* utilities I've shown you, but not installed by default, so it's no fun :P. Regardless, here are some FFMPEG commands that provide similar functionality as seen above:

Speech Synthesis

This one is one of my favorites to play with -- the utility say converts text to audible speech.

But wait, there's more. Apple's speech synthesis supports the TUNE format, which allows you to "shape the overall melody and timing of an utterance... for example ... to make an utterance sound as if it is spoken with emotion".

To demonstrate this, create a file named apple.txt (or whatever) with the following contents:

Command-line Music Player

Reference

afplay

$ afplay --help

    Audio File Play
    Version: 2.0
    Copyright 2003-2013, Apple Inc. All Rights Reserved.
    Specify -h (-help) for command options

Usage:
afplay [option...] audio_file

Options: (may appear before or after arguments)
  {-v | --volume} VOLUME
    set the volume for playback of the file
  {-h | --help}
    print help
  { --leaks}
    run leaks analysis
  {-t | --time} TIME
    play for TIME seconds
  {-r | --rate} RATE
    play at playback rate
  {-q | --rQuality} QUALITY
    set the quality used for rate-scaled playback (default is 0 - low quality, 1 - high quality)
  {-d | --debug}
    debug print output

afinfo

$ afinfo --help

   Audio File Info
   Version: 2.0
   Copyright 2003-2016, Apple Inc. All Rights Reserved.
   Specify -h (-help) for command options

Usage:
afinfo [option...] audio_file(s)

Options: (may appear before or after arguments)
  {-h --help}
    print help
  {-b --brief}
    print a brief (one line) description of the audio file
  {-r --real}
    get the estimated duration after obtaining the real packet count
  { --leaks }
        run leaks at the end
  { -i --info }
      print contents of the InfoDictionary
  { -u --userprop } 4-cc
      find and print a property or user data property (as string or bytes) [does not print to xml]
  { -x --xml }
      print output in xml format
  { --warnings }
      print warnings if any (by default warnings are not printed in non-xml output mode)

afconvert

$ afconvert --help

    Audio File Convert
    Version: 2.0
    Copyright 2003-2013, Apple Inc. All Rights Reserved.
    Specify -h (-help) for command options

Usage:
afconvert [option...] input_file [output_file]
    Options may appear before or after the direct arguments. If output_file
    is not specified, a name is generated programmatically and the file
    is written into the same directory as input_file.
afconvert input_file [-o output_file [option...]]...
    Output file options apply to the previous output_file. Other options
    may appear anywhere.

General options:
    { -d | --data } data_format[@sample_rate][/format_flags][#frames_per_packet]
        [-][BE|LE]{F|[U]I}{8|16|24|32|64}          (PCM)
            e.g.   BEI16   F32@44100
        or a data format appropriate to file format (see -hf)
        format_flags: hex digits, e.g. '80'
        Frames per packet can be specified for some encoders, e.g.: samr#12
        A format of "0" specifies the same format as the source file,
            with packets copied exactly.
        A format of "N" specifies the destination format should be the
            native format of the lossless encoded source file (alac, FLAC only)
    { -c | --channels } number_of_channels
        add/remove channels without regard to order
    { -m | --channelmap } list of input channels in output
        set a channel map, mapping which input channel goes to each output channel.
        channel number starts at zero. -1 makes a silent output channel.
        For example, to reverse a stereo stream: -m 1 0
    { -l | --channellayout } layout_tag
        layout_tag: name of a constant from CoreAudioTypes.h
          (prefix "kAudioChannelLayoutTag_" may be omitted)
        if specified once, applies to output file; if twice, the first
          applies to the input file, the second to the output file
    { -b | --bitrate } total_bit_rate_bps
         e.g. 256000 will give you roughly:
             for stereo source: 128000 bits per channel
             for 5.1 source: 51000 bits per channel
                 (the .1 channel consumes few bits and can be discounted in the
                 total bit rate calculation)
    { -q | --quality } codec_quality
        codec_quality: 0-127
    { -r | --src-quality } src_quality
        src_quality (sample rate converter quality): 0-127 (default is 127)
    { --src-complexity } src_complexity
        src_complexity (sample rate converter complexity): line, norm, bats minp
    { -s | --strategy } strategy
        bitrate allocation strategy for encoding an audio track
        0 for CBR, 1 for ABR, 2 for VBR_constrained, 3 for VBR
    --prime-method method
        decode priming method (see AudioConverter.h)
    --prime-override samples_prime samples_remain
        can be used to override the priming information stored in the source
        file to the specified values. If -1 is specified for either, the value
        in the file is used.
    --no-filler
        don't page-align audio data in the output file
    --soundcheck-generate
        analyze audio, add SoundCheck data to the output file
    --media-kind "media kind string"
        media kinds are: "Audio Ad", "Video Ad" 
    --anchor-loudness
        set a single precision floating point value to
        indicate the anchor loudness of the content in dB
        Note that for MP4 and M4* file types, this requires that the 
        --soundcheck-generate option is also enabled.
    --anchor-generate
        Analyze audio and add dialogue anchor level data to output file
        Note that for MP4 and M4* file types, this requires that the 
        --soundcheck-generate option is also enabled.
    --generate-hash
        generate an SHA-1 hash of the input audio data and add it to the output file.
    --codec-manuf codec_manuf
        specify the codec with the specified 4-character component manufacturer
        code
    --dither algorithm
        algorithm: 1-2
    --mix
        enable channel downmixing
    { -u | --userproperty } property value
        set an arbitrary AudioConverter property to a given value
        property is a four-character code; value can be a signed
        32-bit integer or a single precision floating point value.
        e.g. '-u vbrq <sound_quality>' sets the sound quality level
             (<sound_quality>: 0-127)
        May not be used in a transcoding situation.
    -ud property value
        identical to -u except only applies to a decoder. Fails if there is no
        decoder.
    -ue property value
        identical to -u except only applies to an encoder. Fails if there is no
        encoder.

Input file options:
    --decode-formatid data_format_id
        For input audio files with multiple data format layers (e.g. AAC_HE), 
        specify by format id (e.g. 'aach') which layer of the input file to
        decode.
    --read-track track_index
        For input files containing multiple tracks, the index (0..n-1)
        of the track to read and convert.
    --offset number_of_frames
        the starting offset in the input file in frames. (The first frame is
        frame zero.)
    --soundcheck-read
         read SoundCheck data from source file and set it on any destination
         file(s) of appropriate filetype (.m4a, .caf).
    --copy-hash
         copy an SHA-1 hash chunk, if present, from the source file to the output file.
    --gapless-before filename
        file coming before the current input file of a gapless album
    --gapless-after filename
        file coming after the current input file of a gapless album

Output file options:
    -o filename
        specify an (additional) output file.
    { -f | --file } file_format
        use -hf for a complete list of supported file/data formats
    --condensed-framing field_size_in_bits
        specify storage size in bits for externally framed packet sizes.
        Supported value is 16 for aac in m4a and m4b file format.

Other options:
    { -v | --verbose }
        print progress verbosely
    { -t | --tag }
        If encoding to CAF, store the source file's format and name in a user
        chunk. If decoding from CAF, use the destination format and filename
        found in a user chunk.
    { --leaks }
        run leaks at the end of the conversion
    { --profile }
        collect and print performance information

Help options:
    { -hf | --help-formats }
        print a list of supported file/data formats
    { -h | --help }
        print this help

say

SAY(1)         Speech Synthesis Manager     SAY(1)



NAME
       say - Convert text to audible speech

SYNOPSIS
     say [-v voice] [-r rate] [-o outfile [audio format options] | -n name:port | -a device] [-f file | string ...]

DESCRIPTION
       This tool uses the Speech Synthesis manager to convert input text to
       audible speech and either play it through the sound output device
       chosen in System Preferences or save it to an AIFF file.

OPTIONS
       string
     Specify the text to speak on the command line. This can consist of
     multiple arguments, which are considered to be separated by spaces.

       -f file, --input-file=file
     Specify a file to be spoken. If file is - or neither this parameter
     nor a message is specified, read from standard input.

       -v voice, --voice=voice
     Specify the voice to be used. Default is the voice selected in
     System Preferences. To obtain a list of voices installed in the
     system, specify '?' as the voice name.

       -r rate, --rate=rate
     Speech rate to be used, in words per minute.

       -o out.aiff, --output-file=file
     Specify the path for an audio file to be written. AIFF is the
     default and should be supported for most voices, but some voices
     support many more file formats.

       -n name, --network-send=name
       -n name:port, --network-send=name:port
       -n :port, --network-send=:port
       -n :, --network-send=:
     Specify a service name (default "AUNetSend") and/or IP port to be
     used for redirecting the speech output through AUNetSend.

       -a ID, --audio-device=ID
       -a name, --audio-device=name
     Specify, by ID or name prefix, an audio device to be used to play
     the audio. To obtain a list of audio output devices, specify '?' as
     the device name.

       --progress
     Display a progress meter during synthesis.

       -i, --interactive, --interactive=markup
     Print the text line by line during synthesis, highlighting words as
     they are spoken. Markup can be one of

     o   A terminfo capability as described in terminfo(5), e.g. bold,
         smul, setaf 1.

     o   A color name, one of black, red, green, yellow, blue, magenta,
         cyan, or white.

     o   A foreground and background color from the above list,
         separated by a slash, e.g. green/black. If the foreground color
         is omitted, only the background color is set.

     If markup is not specified, it defaults to smso, i.e. reverse
     video.

       If the input is a TTY, text is spoken line by line, and the output
       file, if specified, will only contain audio for the last line of the
       input.  Otherwise, text is spoken all at once.

AUDIO FORMATS
       Starting in MacOS X 10.6, file formats other than AIFF may be
       specified, although not all third party synthesizers may initially
       support them. In simple cases, the file format can be inferred from the
       extension, although generally some of the options below are required
       for finer grained control:

       --file-format=format
     The format of the file to write (AIFF, caff, m4af, WAVE).
     Generally, it's easier to specify a suitable file extension for the
     output file. To obtain a list of writable file formats, specify '?'
     as the format name.

       --data-format=format
     The format of the audio data to be stored. Formats other than
     linear PCM are specified by giving their format identifiers (aac,
     alac). Linear PCM formats are specified as a sequence of:

     Endianness (optional)
         One of BE (big endian) or LE (little endian). Default is native
         endianness.

     Data type
         One of F (float), I (integer), or, rarely, UI (unsigned
         integer).

     Sample size
         One of 8, 16, 24, 32, 64.

     Most available file formats only support a subset of these sample
     formats.

     To obtain a list of audio data formats for a file format specified
     explicitly or by file name, specify '?' as the format name.

     The format identifier optionally can be followed by @samplerate and
     /hexflags for the format.

       --channels=channels
     The number of channels. This will generally be of limited use, as
     most speech synthesizers produce mono audio only.

       --bit-rate=rate
     The bit rate for formats like AAC. To obtain a list of valid bit
     rates, specify '?' as the rate. In practice, not all of these bit
     rates will be available for a given format.

       --quality=quality
     The audio converter quality level between 0 (lowest) and 127
     (highest).

ERRORS
       say returns 0 if the text was spoken successfully, otherwise non-zero.
       Diagnostic messages will be printed to standard error.

EXAMPLES
    say Hello, World
    say -v Alex -o hi -f hello_world.txt
    say --interactive=/green spending each day the color of the leaves
    say -o hi.aac 'Hello, [[slnc 200]] World'
    say -o hi.m4a --data-format=alac Hello, World.
    say -o hi.caf --data-format=LEF32@8000 Hello, World

    say -v '?'
    say --file-format=?
    say --file-format=caff --data-format=?
    say -o hi.m4a --bit-rate=?

SEE ALSO
       "Speech Synthesis Programming Guide"



1.0         2017-02-16        SAY(1)

Command-line Audio on a Mac (Without Installing Anything)

Audio File Playback