Overview
Mac and Windows have different mechanisms for enumerating the files in a directory. This article discusses one approach to creating an abstraction class that presents a unified interface which allows files to be enumerated on either platform.
Grab this zip file: crossplat.zip. It contains test projects that can be built with both DevStudio 7 and Xcode 3. There's quite of bit of extra framework in there for something I've just started working on (which you can ignore), along with some platform-specific abstractions I've used on a number of other projects built on both Mac and PC.
The code I will be discussing here is defined in QzDirList.h, QzDirListWin.cpp,
and QzDirListMac.cpp. This code gets a bit of exercise in QzUnitTest.cpp,
in the TestDirList
function, which scans the project directory
and dumps a list of all files and folders out to the log file.
This code also relies on lower-level conventions implemented in QzSystem.cpp and UtfString.cpp, which includes converting all strings to UTF-8 format to avoid any dependency on a particular platform's favored representation of strings (UTF-16 on Windows, UTF-8 and UTF-32 on Macs, and plain old ASCII on some versions of Linux/Unix). If you need to write platform-indepedent code, you will have to find a string library that abstracts away these issues (especially if your app stores strings in files, and needs those files to be readable on multiple platforms).
The intent of this article is to document how to enumerate files, not to provide a single implementation that everyone can use as-is.
Physical Drives
On Windows, extra work needs to be done to allow enumerating physical drives (as well as mapped drives and network drives). For MacOS, this issue does not exist, since all devices are mounted as yet another directory under the root folder.
The approach I use to enumerating drive names is by using the
GetLogicalDriveStrings
function. This returns a string
containing all of the drive names, with each name separated by a '\0'
terminator.
An alternative is to use GetLogicalDrives
and GetDriveType
,
iterating over the bitmask returned by GetLogicalDrives
.
I personally prefer GetLogicalDriveStrings
, since it allows
Windows to format the name string, leaving the system configuration to
be responsible for capitalization. This helps keep the capitalization of
the drive names consistent with other apps on the system.
When traversing up the directory structure, FindFirstFile
and
FindNextFile
stop when reaching the root directory of the
current logical drive. A Linux-based implementation can stop here, but a
Windows implementation would need to allow going one level higher and
listing the names of all logical drives (both physical drives and mapped
drives to remote machines).
The Windows implementation of QzDirList::ScanPath
handles this
by treating an empty path as a request to enumerate all drive labels
instead of as being the root directory of a drive.
As a side note: some systems may list the A:
drive as being
valid, even when there is no physical device. This is a hardware/motherboard
configuration issue. Attempting to access the A:
drive on
one of these systems will result in a time out (during which the app
stops responding) while the system attempts to read from the missing
drive. From a programming point of view, it will appear as if there is
no disk in the A:
drive, when in fact there is no disk drive
there at all. Do not assume there is a physical device if the system claims
that A:
exists — yet at the same time, do not ignore
this drive label since this disk does exist on some systems.
Basic Enumeration
For Win32, the basic enumeration loop uses FindFirstFile
and FindNextFile
to iterate over all files in a directory:
HANDLE hFind = FindFirstFile(m_Path, &data); if (INVALID_HANDLE_VALUE != hFind) { do { ... do something ... } while (FindNextFile(hFind, &data)); FindClose(hFind); }
Under MacOS/Linux, opendir
and readdir
are
used in much the same way:
DIR *pDir = opendir(reinterpret_cast<char*>(m_Path)); if (NULL != pDir) { dirent *pInfo; while (NULL != (pInfo = readdir(pDir))) { ... do something ... } closedir(pDir); }
The one subtle difference in the looping logic is that FindFirstFile
returns a valid file entry, so the loop needs to process this entry before
calling FindNextFile
for the first time. Since opendir
does not return any data, readdir
must be called to obtain the
first file entry.
And it bears mentioning: FindFirstFile
is one of the few Win32
functions that returns INVALID_HANDLE_VALUE
on failure instead
of NULL
. Make certain you are testing against the correct symbol.
For MacOS/Linux, that there is another function that can be used: readdir_r
.
This is intended as a thread-safe version of readdir
, which can
be used if you have multiple threads doing file transactions. Some
programmers consider it to be harmful, since using it may result in race
conditions or buffer overruns. However, this may only be true for certain
implementations. I am not familiar enough with the subject to make any
recommendations. If you think you may need to use readdir_r
,
do some research on it and make your own evaluation.
Name Issues
Do not use wildcards in paths. Windows needs to have *.*
appended to the path when calling FindFirstFile
(or you could
just use *
as the wildcard, but I have had problems in the
past on older versions of Windows — the legacy *.*
still
works, and is more compatibility safe in my experience). On the other
hand, opendir
needs to have the exact folder name for the search, without
any wildcards. As such, you need to keep the platform-specific wildcard
hidden inside the implementation, with the rest of your app using only
the folder name for the search.
Win32 functions can use wildcards and other routines to scan only files
that have a specific file extension (or in the case of higher level
functions, arrays of file extensions). This is not directly supported
by opendir
, so filtering files according to their extension
should be done by the app itself. By using the same filtering code on all
platforms, you can avoid platform-specific quirks in filtering behavior.
Always use forward slashes ('/'
) for directory names. Windows
will correctly handle these in filenames (at least when using fopen
and other standard routines), whereas MacOS does not handle
backslashes ('\'
) in file names. Avoid having double
slashes ('//'
) in file paths. Most operating systems will
ignore these, but I have seen some systems reject path names containing
double slashes. For complete generality, it is best to condition the
file name to remove duplicate slashes.
Parent Folders
A significant issue with Linux-based systems is the common use of soft links to folders in other parts of the directory structure. The problem arises when changing directory through a soft link: attempting to return to the parent directory will not return to the previous folder, instead it changes to the parent of the linked folder.
An example of this from the command line would look something like this:
$ pwd /home/lee $ cd foo $ pwd /home/bob/some/other/path/foo $ cd .. $ pwd /home/bob/some/other/path
In this example, "/home/lee/foo" is a link to "/home/bob/some/other/path/foo". Issuing the commands "cd foo" and "cd .." does not return the user back to the original directory.
This same problem arises when traversing directory trees from within a program.
The QzDirList
class handles this by initially finding the
absolute path of the starting directory, then using this string for all relative
traversals. When traversing up to the parent directory with UpOneLevel
,
the code trims the current folder name off of the absolute path, which returns
to the previous directory, even after traversing through a link to some other
location within the directory tree.
On the other hand, if you really did need to traverse to the parent folder of
a soft-linked folder, you would need a different implementation of UpOneLevel
that would first update the absolute path, then traverse up from there.
Having never needed that kind of functionality, it is not supported by QzDirList.
This problem is not as significant an issue on Windows. The most likely source of aliasing on Windows is creating a drive mapping, either to a networked drive, or to a folder within a local drive. It is possible to create a mapped drive to a directory on a local drive, so two different paths can lead to the same physical directory. But path names are always relative to the drive name on Windows (even when the drive name is not explicitly stated in the file name), so Linux's traverse-to-parent problem does not exist.
File Attributes
For Windows, file attributes are returned as part of the
WIN32_FIND_DATA
structure (as well as
GetFileAttributes
). MacOS, however, does not
return file attributes from calls to readdir
. You will
need to call stat
to find out additional information
about the file, which stores the attributes information in
struct stat
. (Aside: Yes, some addlepated programmer
really did use the exact same name for both a function and a struct.)
Windows allows files to be marked as "system". Mostly this is used to denote files that are part of the OS runtime, or are hidden files that the user should not touch (such as the pagefile or recycle bin). Essentially, this is just a second "hidden" flag, which can be used to hide system files from naïve end users. There is no direct equivalent on MacOS, so this flag is ignored in that implementation.
Detecting the read-only flag is useful when preparing to write files: if the file is write-only, the user can be prompted with a meaningful error message, instead of generic "write failed" error. This makes it easy to filter out read-only files when enumerating a directory on Windows.
However, MacOS/Linux does not have a single "read-only" flag for files.
This information is stored in the struct stat::st_attr
field, which includes the read/write/execute permissions for
owner/group/others — in other words, the values set by
chmod
. I have left this test out of the MacOS code,
since it is more of a policy question as to which field should be
used to determine the read-only-ness of a file.
Hidden files are denoted by a file attribute in Win32, whereas MacOS/Linux
considers any file that starts with a '.'
to be hidden.
Since hidden files are easy to detect on both platforms, both implementations
of QzDirList can filter out hidden files.
File Timestamps
Most files on Windows are stored using the FILETIME
structure,
which contains a 64-bit timestamp. However, FAT-16 and FAT-32 file systems
(which includes some flash devices) store the timestamps with lower precision
— the problem here being that FAT timestamps are truncated to multiples
to 2 seconds, with the fraction of the time being stored separately at 10 ms
precision. The fraction time is not always reported, and is discarded when
moving files from NTFS to FAT, so the timestamp can be of higher
precision, but this is not reliable.
Another inconsistency is that all of the FAT file systems timestamp the files with the local time (as opposed to NTFS, which uses UTC time), so you need to know the timezone and DST of the system upon it was created to recover the correct time.
On MacOS/Linux, file timestamps are stored using time_t
values,
which are only accurate to the nearest second.
For compatibility, you need to decide on the precision of timestamps to store
with files. I round everything to the nearest second, since higher precision
is seldom necessary. With Windows, my code uses time64_t
, but
stored as a full 32-bit value, so roll-over issues can be ignored until 2106.
But Mac only has the 31-bit (signed) time_t
value, which will
roll over in 2038.
The Code
I'm not going to reproduce all of the code in this article. The two versions
of the QzDirList
class are found in
this zip file,
which is close to a thousand lines of code in total. Look at QzDirListMac.cpp and
QzDirListWin.cpp to see how the same functions are implemented differently
for the two platforms.
The following code snippet shows how the example class could be used:
QzDirList dir; // Before scanning the directory, set these flags to // indicate what types of properties should be reported. dir.m_ShowDirs = true; dir.m_ShowFiles = true; dir.m_ShowHidden = false; dir.m_ShowReadOnly = true; dir.m_ShowSystem = false; // Now scan the current directory. This will build up // a table of names that is stored in the object. dir.ScanDirectory(CharToUtf("."), NULL); printf("path: %s\n", dir.m_Path); // Now loop over the array of file names and print them out. for (U32 i = 0; i < dir.m_EntryCount; ++i) { printf("%s: %s\n", dir.m_pList[i].IsDir ? "dir " : file", dir.m_pList[i].pName); }