Friday, September 6, 2013

python crossplatform handling of wildcard command line arguments

Windows and Linux shells handle wildcard arguments differently. Linux (at least under bash) expands wildcard arguments before passing them to a program. Windows passes them without expanding them. This leads to trouble if you want to write a command line utility that will work correctly in both Linux and Windows (or even one compatible with wildcard arguments at all in Windows).
In Python, the typical way to expand a wildcard is with the glob module.
either: glob.glob to return a list, or glob.iglob to return an iterator (which may be preferable if a large list is expected).

Here's a solution that uses the argparse and glob modules:
import argparse  
from glob import glob  
def main(file_names):  
    print file_names  
if __name__ == "__main__":  
    parser = argparse.ArgumentParser()  
    parser.add_argument("file_names", nargs='*') 
    #nargs='*' tells it to combine all positional arguments into a single list  
    args = parser.parse_args()  
    file_names = list()  
    #go through all of the arguments and replace ones with wildcards with the expansion
    #if a string does not contain a wildcard, glob will return it as is.
    for arg in args.file_names:  
        file_names += glob(arg)  
One caveat is that I have noticed that python and bash don't sort the expanded lists in the same way, so if for some reason you need deterministic sorting of input, you should sort the resulting list yourself.

see also:


  1. This is not quite correct behavior, in my opinion. I feel if the user specifies a filename and the file does not exist, the program should raise an error. If you pass a string that contains no wildcards and the file does not exist, glob will return an empty list, meaning the nonexistent file will be silently ignored. So, in my opinion, the right way is to check if the filename contains any glob tokens ("*", "?", or "[") and only run it through glob if it does.

    I really wish the argparse module had an option to handle globbing automatically...

    1. Hi furrykef,
      Thanks for the comment. You're right, I didn't think of that. I need to update this code to take that into account.

      In addition to what you said, there is another bug I noticed. Someone might want to pass in literals containing "*", or other characters that glob will remove ("*" can be in linux filenames). The code, as written, will run everything through glob, and there's no way of telling it not to. When the shell is processing wildcards, you can use quotes to ignore wildcards. Using the code I have here, it will process those as globs in the program, even if the shell doesn't.