Mac/PC Abstration, Part 1:
OS Primitives

Overview

Having spent a large part of my career writing code for both Windows and various embedded systems, I've had to write code that could be easily ported (preferably without changing anything more than a header file). Usually, this meant writing the code on Windows (where it's much easier to interactively edit and debug) then move that code over to build the embedded version. And if you can point both compilers to the exact same file in the exact same directory, testing and maintenance are that much easier.

Lately, I've been writing code for the Apple/Mac platform. Which has been an amazing experience. And by amazing, I mean painful. DevStudio and Xcode share a certain similarity in their documentation: they do a fairly good job of telling you exactly what each function does, and a really poor job of indicating how those pieces go together to make a working program.

This means if you're a Windows programmer, you readily recognize the names Petzold, Prosise, Richter, and a host of others. And have many of their books on hand for reference, making your life easier.

My experience with Macs is that there virtually no books covering the various libraries, and what few exist are out-of-date and apply to older versions of the Mac OS. Apple frequently changes and deprecates their APIs, so any Google searches are likely to turn up references to obsolete code, or no answers at all. Therefore, figuring out how to get anything done is a frustrating time sink.

(This has also made me appreciate more the amount of effort Microsoft has put into backwards compatibility. Shake your fist in the direction of Redmond all you want, but programs and games written a decade ago are still able to run on modern systems.)

Worse, Apple is pushing developers towards using Objective C, much like Microsoft is pushing developers towards C#. Both of which are useless if you need to write portable code.

In short, useful information is hard to find, so I'm pulling together notes I've made while dealing with the exasperating experience of porting several apps to the Mac, and writing more apps from the ground-up that run on both Windows and Mac.

Note, however, that the code I deal with is either console code or pure OpenGL/DirectX code. Thus, I will make no mention of buttons, widgets, dialogs, or other GUI elements. If you need those, go have a look at wxWidgets. Its implementation and limited support may leave something to be desired, but it does work.

Low-Level Primitives

The best advice to start with is this: Never include any Windows or Mac header files in the bulk of your code. Always use an abstraction layer to hide all of the differences and mutual incompatibilities. The fewer source files there are that can see the platform-specific headers, the more easily the result will be to port.

This article (and the next few) will cover how I've gone about abstracting out those differences and mutual incompatibilities. In this article, I'm going to focus on low-level OS routines. Future articles will cover multi-threading and how to write the MacOS Carbon equivalent of WinMain().

Grab this zip file: crossplat.zip. It contains a project that can be built with both DevStudio 7 and Xcode 3. There's quite of bit of extra framework in there for something I've just started working on (which you can ignore), along with some platform-specific abstractions I've used on a number of other projects to build on both Mac and PC.

For this article, all of the primitives I'm going to talk about are defined in QzSystem.h. There are two different source files, one for Windows (QzSystemWin.cpp) and one for Mac (QzSystemMac.cpp). This is not an exhaustive set of low-level routines, but they cover all of the functionality I've yet needed for several projects, including a big game engine I've been writing (currently 300,000+ lines of C++ code).

If you avoid over-reliance on platform-specific routines, not only will you find that you do not need much low-level functionality, but you will have a much easier time porting to other platforms. However, if you have to use an existing software library, portability will require major refactoring (presuming you have access to the source code, and permission to modify it) unless the library was written to support cross-platform builds.

Unicode

Windows programmers are fortunate: Microsoft now uses 16-bit wide chars for everything, so they never have to worry about Unicode issues ever again...

Uh... No. False. Wrong. Crud. But that is what I (and most Windows developers) have assumed for a long time. The truth is that Unicode is actually uses 21-bit symbols. (There was a time when it was 32 bits, but it has been reorganized since then.) But the extra five bits are used for extra code tables that mostly only matter if you're dealing with Asian CJKV symbols. So those 16-bit wide chars aren't wide enough. But they are good enough if you only care about the Americas and most of Europe.

Apple gets around this by having Xcode make wchar_t a 32 bit type (though you can change this if you can find where the gcc setting for this is hidden). But Apple doesn't use wchar_t, so this is mostly a footnote... unless you try porting your Win32 code over to the Mac, only to find your memory addressing messed up because your code assumes wide chars are 16 bits instead of 32.

Apple avoids this by using UTF-8 strings for virtually all interface functions (or the CFString class). So any function that takes a char* (like fopen) is actually expecting a UTF-8 string (although for all I know, there may well be a minefield of exceptions to this).

If you're an old-time Windows programmer, you'll no doubt cringe when you realize that UTF-8 uses multi-byte characters. You remember all of those mb...() functions? The ones most of us were delighted to ditch in exchange for using wchar_t? Yeah, those.

I'm not going to go into the details of UTF-8 versus wchar_t versus any other representation. There are plenty of web pages out there that describe these things in great detail. Just realize two things: this is an issue you will need to address, and every platform API handles strings differently, so whatever approach you pick will be incompatible on at least one platform. Your code will need to have an abstraction layer that translates strings to whatever representation is used on the current platform in order to pass strings to/from the OS.

If you look in the example project directory, you'll find some UTF source files I put together while exploring the Unicode standard. They are not complete (mostly just included because they're used in the font rendering system I was experimenting with to get accented characters). You're probably best off ignoring them, unless you want to see one possible (and possibly wrong) implementation of UTF-8 code.

Thread-Safe Increment/Decrement

In DevStudio, InterlockedIncrement and InterlockedDecrement are used to twiddle counters without the expense of critical sections. Even better, these are compiler intrinsics, meaning that they are not actually function calls (at least not for release builds), so they are usually extremely fast and efficient. The main benefit, however, is that these operators do not suffer from deadlock.

The Mac has the equivalent functions IncrementAtomic and DecrementAtomic. Near as I can tell, these are always function calls, so they're a bit more expensive to use. However, the big difference is that the Windows Interlocked... routines always return the post-increment or post-decrement value (at least since the days of Windows NT). The Mac ...Atomic routines, however, return the pre-increment and pre-decrement values. Which will break reference-counting code that expects post-inc/dec logic (or any other code that needs a fast, thread-safe increment operator).

In my library, I abstracted these behind QzThreadSafeIncrement and QzThreadSafeDecrement. This makes the calls on Windows a little less efficient, but having an abstraction I don't have to think about more than makes up for it. And these days, worrying about the expense of a single function call is a bad case of premature optimization.

The Windows version of QzThreadSafeIncrement is simply this:

S32 QzThreadSafeIncrement(S32 *pValue) // Windows version
{
    return InterlockedIncrement(pValue);
}

However, the Mac version has to modify the return result to a post-increment instead of pre-increment value.

S32 QzThreadSafeIncrement(S32 *pValue) // MacOS version
{
    // Add a +1 since IncrementAtomic() returns the pre-increment value.
    return IncrementAtomic(pValue) + 1;
}

Critical Sections

Windows has four simple functions for critical sections: InitializeCriticalSection, EnterCriticalSection, LeaveCriticalSection, and DeleteCriticalSection.

The same functionality is accomplished on the Mac using pthread mutexes. My approach is to new up a critical section struct, storing only the pointer to it in the wrapper class. The rest of the source code only sees a void*. This keeps the actual implementation of the critical section hidden from the rest of the source code.

The following code is equivalent to InitializeCriticalSection:

    pthread_mutex_t *pMutex = new pthread_mutex_t;

    pthread_mutex_init(pMutex, NULL);

    m_pCritSection = reinterpret_cast<void*>(pMutex);

The following code is equivalent to EnterCriticalSection:

    pthread_mutex_lock(reinterpret_cast<pthread_mutex_t*>(m_pCritSection));

The following code is equivalent to LeaveCriticalSection:

    pthread_mutex_unlock(reinterpret_cast<pthread_mutex_t*>(m_pCritSection));

The following code is equivalent to DeleteCriticalSection:

    pthread_mutex_t *pMutex = reinterpret_cast<pthread_mutex_t*>(m_pCritSection);

    pthread_mutex_destroy(pMutex);
    delete pMutex;

Sync Events

On Windows, thread synchronization is accomplished by CreateEvent, SetEvent, and WaitForSingleObject.

On the Mac, you need to use pthread condition variables to deal with all synchronization. This requires a condition variable created by pthread_cond_init and a mutext created by pthread_mutex_init. You signal the event with pthread_cond_signal, and wait for signals with pthread_cond_timedwait. (Note, this is probably not the only way to accomplish this with pthreads, but it is the approach with which I have had success.)

By example, the following code on Windows will create an event:

    m_hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);

Note that manual reset is always disabled and the event is always created in a non-signalled state. This is necessary for compatibility with pthread conditions, which always auto-reset, and cannot be created in a signalled state (though nothing stops you from explicitly signalling the event after you have created it).

Signal the event with the following function call:

    SetEvent(m_hEvent);

Note that PulseEvent is not supported. There is no pthread equivalent to PulseEvent. More importantly, PulseEvent is unreliable, since race conditions in the kernel can result in a waiting event not being signalled if the two threads pulse and wait on the event at approximately the same time. It is best to avoid designing code that relies on PulseEvent, or you are likely to end up with your app running reliably for hours or days, then it crashes, locks up, or exhibits strange behavior because a worker thread stopped processing data.

Meanwhile, another thread would wait for the event to be signalled:

    result = WaitForSingleObject(m_hEvent, milliseconds);

    if (WAIT_OBJECT_0 == result) {
        return QzSyncWait_Signalled;
    }

    if (WAIT_TIMEOUT == result) {
        return QzSyncWait_Timeout;
    }

    return QzSyncWait_Error;

And then, when you're done, this function call will release the resources associated with the sync event.

    CloseHandle(m_hEvent);

By comparison, pthreads require both a condition and a mutex. In my library, I've defined a struct that contains both objects:

struct QzSyncEvent_t
{
    pthread_cond_t  Cond;
    pthread_mutex_t Mutex;
};

By using a struct, I can new up a struct and store the pointer to the struct in the sync object. The advantage is that I can define a single QzSyncEvent class that is visible to the entire project, which only stores a single pointer. Then the platform-specific implementation knows how to cast that pointer to allocate, use, and delete the resources. This saves me from having to litter the header files with lots of #ifdefs to expose different implmentations, or using separate header files that must be manually synchronized when code changes are required.

Using pthreads, the following code is the equivalent to CreateEvent:

    QzSyncEvent_t *pEvent = new QzSyncEvent_t;
    m_hEvent = pEvent;

    pthread_cond_init(&(pEvent->Cond), NULL);
    pthread_mutex_init(&(pEvent->Mutex), NULL);

Signal the event by calling pthread_cond_signal:

    QzSyncEvent_t *pEvent = reinterpret_cast<QzSyncEvent_t*>(m_hEvent);

    pthread_cond_signal(&(pEvent->Cond));

To wait for an event and time out after some period of time (since infinite waits can result in deadlock), you have to fill in a timespec struct, which stores the time at which the condition will stop waiting. This is different from WaitForSingleObject, which takes a relative value indicating the maximum amount of time to wait.

The trick to pthread_cond_timedwait is that the timespec is an absolute time. In many other cases when timespec is being used, it is a relative time. The convention used by my wrapper library also uses a relative duration, so the code needs to query for the current system, add the maximum duration of the wait, then we can issue a wait on the condition variable.

    QzSyncEvent_t *pEvent = reinterpret_cast<QzSyncEvent_t*>(m_hEvent);

    // pthread_cond_timedwait uses an absolute time to denote when to
    // stop waiting, instead of a relative value.  So we must query the
    // current time (which is in seconds + microseconds), convert to a
    // different struct (seconds + nanoseconds), and add the millisecond
    // timeout wait duration.
    timeval curr;
    gettimeofday(&curr, NULL);

    timespec abstime;
    abstime.tv_sec  = curr.tv_sec + milliseconds / 1000;
    abstime.tv_nsec = (curr.tv_usec * 1000) + ((milliseconds % 1000) * 1000000);

    // Protect against rollover of nanoseconds.
    abstime.tv_sec += abstime.tv_nsec / 1000000000;
    abstime.tv_nsec = abstime.tv_nsec % 1000000000;

    result = pthread_cond_timedwait(&(pEvent->Cond), &(pEvent->Mutex), &delay);

    if (0 == result) {
        return QzSyncWait_Signalled;
    }

    if (ETIMEDOUT == result) {
        return QzSyncWait_Timeout;
    }

    return QzSyncWait_Error;

Finally, free the resources:

    QzSyncEvent_t *pEvent = reinterpret_cast<QzSyncEvent_t*>(m_hEvent);

    pthread_cond_destroy(&(pEvent->Cond));
    pthread_mutex_destroy(&(pEvent->Mutex));

    delete pEvent;

Note: If you are using multiple events, you could save some system resources by creating only a single mutex and sharing it between all condition variables. This is probably safe if you're only dealing with a couple of threads, but in a system with a very large number of threads that frequently signal and wait on events, you'll probably find that reusing the same mutex will degrade performance since all of those threads are accessing the same mutex every time they test any of the condition variables.

Also, you may be wondering about WaitForMultipleObjects. There is no easy way of accomplishing a multiple wait using pthreads. And various internet wags insist that WaitForMultipleObjects is a Very Bad Thing™. If you want to write platform-independent code, your safest approach is to completely avoid logic that requires waiting for multiple events. Should you think that multiple waits are necessary for your project (probably because that's the way someone else implemented it, and you're the poor sod tasked with porting), you're much better off doing some significant research on synchronization to learn other ways of syncing threads.

Processor Info

There are a couple of processor info functions I like to have around, one to find out how many CPUs (or cores) there are, and one to report how busy the CPUs are.

Knowing the number of processors (either physical or hyperthreaded) can be useful when spawning worker threads. Some processor-intensive operations can be split up across multiple threads, such as ray tracing, batch image processing, or certain numerical simulations. If you know how many processors are on the machine, you can create one thread per processor and keep all of the processors busy.

On Windows, this information can be easily retrieved with GetSystemInfo (along with a few other bits of potentially useful information about the processor):

U32 QzGetProcessorCount(void) // Windows version
{
    SYSTEM_INFO sysInfo;
    GetSystemInfo(&sysInfo);

    return sysInfo.dwNumberOfProcessors;
}

The equivalent operation on MacOS requires going through the sysctl function:

U32 QzGetProcessorCount(void) // MacOS version
{
    int    mib[2];
    size_t len      = 0;
    S32    cpuCount = 1;

    // Set up a request for the number of CPUs.
    mib[0] = CTL_HW;
    mib[1] = HW_NCPU;

    // Make an initial call to find out how large a buffer is required
    // to hold the value.  Since the third parameter is NULL, no data
    // will be returned, but the len variable will be set to the size
    // of the buffer.
    sysctl(mib, 2, NULL, &len, NULL, 0);

    // The buffer should be a 32-bit int.
    if (4 == len) {

        // Repeat the function call, this time providing a cpuCount
        // variable where the return value will be stored.
        sysctl(mib, 2, &cpuCount, &len, NULL, 0);

        // Clamp the count to a reasonable range, in case it was not
        // filled in.  This should never be a problem, but protect
        // against the code being ported to a different system.
        cpuCount = ClampRange(1, cpuCount, 64);
    }

    return cpuCount;
}

Another useful bit of information is the amount of load on the CPU. Displaying the CPU load on the screen is helpful while debugging graphical apps.

Obtaining the CPU load is somewhat cumbersome on Windows. This specific value is not directly available, but the GetSystemTimes returns the amount of time since booting up that the processor has spent in the idle process, in kernel mode, and in user mode. Note that these values seldom add up to 100%, so trying to compute a proportional value using all three times tends to give erratic results.

By periodically sampling the CPU's idle count and tracking the ellapsed time between samples, we can compute the CPU load:

static LARGE_INTEGER g_SystemTimeIdle  = { 0 };
static U64           g_SystemTimeStamp = 0;
static float         g_SystemTimePrev  = 0.0f;

float QzGetProcessorUsage(void) // Windows version
{
    // How much time has ellapsed since the last time we read the clock?
    U64 timeStamp  = QzPrecisionClockRead();
    U64 frequency  = QzPrecisionClockFrequency();
    U64 duration   = timeStamp - g_SystemTimeStamp;

    // Protect against sampling issues with the system times.  These are
    // only updated with a certain granularity.  Limit this function from
    // updating its state more than twice per second.  Attempting to compute
    // CPU usage too often will result in noisy samples.
    if (duration < (frequency / 2)) {
        return g_SystemTimePrev;
    }

    // For this, we only care about how much time the CPU is idle.
    // If we compute 100% minus idle, we'll know how busy the CPU is.
    // If we were to try adding kernel and user time, the time value
    // might not be as accurate.  On some systems (or maybe some
    // versions of Windows), idle + kernel + user does not always add
    // up to 100% (possibly due to more than one user running processes
    // on the system).

    FILETIME idleFT, kernelFT, userFT;
    if (FALSE == GetSystemTimes(&idleFT, &kernelFT, &userFT)) {
        return g_SystemTimePrev;
    }

    // Convert FILETIME (which is two 32-bit words) into a LARGE_INTEGER
    // (which is a union) so we can do 64-bit math on the timestamps.
    LARGE_INTEGER idle;
    idle.LowPart  = idleFT.dwLowDateTime;
    idle.HighPart = idleFT.dwHighDateTime;

    // How much time has been spend in the idle process?
    // This value is total time for all CPUs.
    double timeDelta  = double(idle.QuadPart - g_SystemTimeIdle.QuadPart);

    // Record the new cycle count and time stamp.
    g_SystemTimeIdle  = idle;
    g_SystemTimeStamp = timeStamp;

    // Normalize the real time value to seconds.
    double wallTime = double(duration) / double(frequency);

    // Normalize the idle duration so it is in seconds.
    // The FILETIME values are 10,000,000 Hz.
    // Idle time is total for all processors, so we need to divide by
    // the number of processors to normalize time.
    double normalizedIdle = timeDelta / double(g_ProcCount * 10000000);

    // Now we can compute the fraction of time the CPU has been idle.
    double fractionIdle = normalizedIdle / wallTime;

    // Subtract that from 1.0 to figure out how busy the CPU is.
    float procUsage = float(1.0 - fractionIdle);

    // Clamp the result, since variations in timing can occasionally
    // produce negative values, or values greater than 1.0.
    g_SystemTimePrev = ClampRange(0.0f, procUsage, 1.0f);

    return g_SystemTimePrev;}

On the other hand, MacOS can return the average CPU load by calling getloadavg. However, this function returns the average load for the past 60 seconds (as well as 5 minutes and 15 minutes, but this code is not using those values). If you're displaying a CPU load graph, it will not be as responsive as the Windows version of QzGetProcessorUsage.

float QzGetProcessorUsage(void) // MacOS version
{
    // Find out the average CPU usage.  This will return the average usage:
    //   [0] = previous 1 minute
    //   [1] = previous 5 minutes
    //   [2] = previous 15 minutes
    //
    // For this code, we only care about the previous 60 seconds (actually,
    // a shorter window would be more responsive and informative when
    // working with graphical apps, but that's not an option on this
    // platform).
    //
    double avg[3] = { 0.0, 0.0, 0.0 };
    getloadavg(avg, 1);
    return float(avg[0]);
}

Time Functions

The basic GetTickCount function on Windows returns an arbitrary timer with millisecond resolution (although usually around 16 millisecond quantization, depending on which version of Windows you're running, and on how many cores are in the processor). For higher precision, you can use the timeGetTime function, which allows you to adjust the clock precision down to one millisecond.

On the Mac, the closest matching function I've found is Microseconds, which as advertised, returns an arbitrary timer with microsecond resolution. But it is stored in a struct as a pair of 32-bit values, so you have to mux the bits together to form a full 64-bit integer.

For platform compatibility, my implementation divides the result by 1000, so the final value is in milliseconds.

Another popular timer on Windows is QueryPerformanceCounter. The timestamp it returns has a different precision on every machine (usually the clock speed of the CPU, but not always). Since the highest precision timer on Macs is Microseconds, so you'll have to use that clock for any high precision timing.

Granted, now that Macs use Intel CPUs, you could use the RDTSC instruction, but this value is not reliable in hyperthreaded or multicode processors.

U32 QzGetMilliseconds(void) // MacOS version
{
    UnsignedWide wide;
    Microseconds(&wide);

    // Mux the two portions of the timestamp into a single 64-bit value,
    // then divide the result to convert microseconds to milliseconds.
    return U32(((U64(wide.hi) << 32) | U64(wide.lo)) / 1000);
}

Another common timestamp on Windows is FILETIME, which is a relative value from January 1, 1601, which increments with a 10 MHz clock. On Macs, file times use time_t values, which are measured in seconds, starting from January 1, 1970. (Unless you're using stat64() to get the timestamps, which uses timespec which contains nanoseconds, or utimes which uses struct timeval which contains microseconds — at least for the OSX implementations.)

The following code shows how to convert between FILETIME and time_t numerical ranges:

// The following magic numbers are used to map to and from Win32's FILETIME
// values (which have a 10 MHz frequency).
//
// January 1st, 1970 in FILETIME: 0x019DB1DED53E8000 == 116444736000000000
//
#define c_FileTimeFor1970     116444736000000000ULL
#define c_FileTimeFrequency   10000000

U64 QzTimeConvertCrtToFileTime(U32 timestamp)
{
    return (U64(timestamp) * U64(c_FileTimeFrequency))
          + U64(c_FileTimeFor1970)
          + U64(c_FileTimeFrequency / 2);
}

U32 QzTimeConvertFileTimeToCrt(U64 timestamp)
{
    // Add a bias of half the clock frequency to minimize truncation issues
    // when going back and forth between FILETIME and CRT times.
    U64 t = timestamp - U64(c_FileTimeFor1970) + U64(c_FileTimeFrequency / 2);
    return U32(t / U64(c_FileTimeFrequency));
}

To get ahead of myself by a couple of articles, another approach to timers is to have the windowing system send a message at regular time intervals. On Windows, the SetTimer function can be used to generate WM_TIMER messages.

I'm not going to go into full detail on how to accomplish this on Macs, since it is part of the whole windowing class system required for a Carbon app, which I'll dive into in part 4 of this series of articles.

The basic approach is to use NewEventLoopTimerUPP to define a timer. This will issue a callback every timer interval. The practical difference from WM_TIMER is that this callback is very precise, but not predictable. The WM_TIMER message is only as precise as GetTickCount, which usually means it arrives at some time interval that is a multiple of 16 milliseconds (or whatever the clock precision is on the installed version of Windows). On the other hand, if you use NewEventLoopTimerUPP to request a callback every 10 milliseconds, then that callback will arrive every 10 milliseconds, almost exactly 10 milliseconds, making it very precise. However, the callbacks only arrive when the app has control of the CPU. When another task takes control of the CPU, you stop getting callbacks, making the timer unpredictable.

To create a timer, you need to create a new event loop timer, then install the timer:

    EventLoopTimerUPP m_TimerUPP;
    m_TimerUPP = NewEventLoopTimerUPP(StubTimer);
    EventTimerInterval delay = 10.0f * kEventDurationMillisecond;
    InstallEventLoopTimer(GetMainEventLoop(), delay, delay, m_TimerUPP,
        this, &m_hTimer); 

Note that callback function needs to be a function. You can only use a C++ class method if that method is defined as static. Since you're able to provide a void* when installing the timer, it's easy to pass in the this pointer, allowing the static callback function to redirect the call into your window class:

static pascal void StubTimer(EventLoopTimerRef hTimer, void *pContext)
{
    reinterpret_cast<QzMainWin*>(pContext)->HandleTimer(hTimer);
}

The fourth article in this series will go into more detail on using Mac timers.

File Operations

If you want to know the absolute path of a file in Windows, you can call the fullpath function. There is no equivalent function on Macs, so you have to implement this yourself. The approach I took (which I do not like, but it works), is to get the current working directory, change the current working directory using the relative path to the file, get that directory name, then change back to the previous directory. All done inside a critical section in case you're running multiple threads that could potentially access the file system.

    // If we're in a multi-threaded environment, we need
    // to protect the working directory from changing while
    // we doing these directory manipulations.
    QzCriticalLock lock(g_pFileSection);

    char oldPath[c_MaxPathLength];
    char partial[c_MaxPathLength];
    char newPath[c_MaxPathLength];

    // Save the current working directory.
    getcwd(oldPath, c_MaxPathLength);

    // Extract the relative path to the target directory.
    strcpy(partial, dirname(relative));

    // Change the current working directory to that directory.
    chdir(partial);

    // Now we can read back the absolute path to the target directory.
    getcwd(newPath, c_MaxPathLength);

    // Then change back to the original working directory
    // so we don't mess up any other threads.
    chdir(oldPath);

Another useful Windows function is splitpath, which also lacks an equivalent in Mac OS. Granted, Macs don't have the concept of a drive name, but a convenience function for splitting into its path, name, and extension is still useful. For this example, I'll just direct you to implementation in QzSplitPath in QzSystemMac.cpp. It really just boils down to locating the last '.' and '/' in the filename. My implementation there relies on several custom library functions, so giving an example here would be awkward (and you'll probably have your own favorite library functions to assist in the string manipulation — since I write lots of plain C code, I tend to avoid class-based utilities, like string classes... plus, not all embedded toolkits have standard routines, and worse, sometimes those routines are full of bugs, so custom implementations are often still required in this day and age to support multiple platforms).

One other comment on files: Apple uses UTF-8 strings for file names, and those UTF-8 strings should be canonically decomposed (AKA "normalized"). The specifics of that bit of Unicode lore are rather extensive. Essentially it requires any accent marks (diacritics) be separated from their associated characters, and placed in a particular order.

For example, the symbol 0x00E9 (é) should be decomposed to the two Unicode values 0x0065 (e) and 0x0301 (´), and stored in that order. When decomposed into UTF-8, this becomes the byte sequence 0x65 0xCC 0x81.

So whatever string library you use, you should make certain that when it converts strings to UTF-8 format, it is canonically decomposing them (or "normalizing" them, depending on the language used to describe the string library). For a some simplified examples, there are several UtfCanonicalDecompose... examples in the UtfString.cpp file. Again, you probably would not want to use that code for anything, since I implemented is as an experiment to better understand how Unicode works.