This page documents various odd Metal behaviors that I have not found anywhere else.
After raising my deployment target to iOS 14 (macOS 11, etc…) I got numerous new compile errors. These errors do not occur with the same compiler with a deployment target of iOS 13 (macOS 10.15, etc…)
It appears this is because various C99/C11 features were ‘removed’ (I gather they were never really supported) in Metal, which is based on C++, not on C. In particular, this affects restrict
, which is not a valid C++ keyword.
I am advised this works as intended. For restrict
anyway, the compiler keyword __restrict
still compiles, although I don’t know if it has any effect.
FB7831220
assert
As of iOS 14, assert
is defined as
#define assert(condition) ((void) 0)
For this reason it has no effect.
If you want assert
-like behavior in Metal, you can use
#define assert(X) if (__builtin_expect(!(X),0)) {float device *f = 0; *f = 0;}
along with the “metal shader validation” diagnostic in Debug GPU-side errors in Metal. It will not trip without this diagnostic.
For a production-ready solution, stdmetal ships with SM_ASSERT
and SM_PRECONDITION
cross-platform macros.
FB7731230
Note that, this sort of control flow can confuse the compiler and lead to issues debugging metal shaders. So you may want to pull this out if you have trouble attaching a debugger.
The best way I know of to do this is to concatenate .c
files into a .metal
file, and compile that.
COUNTER=0
rm -f "${SCRIPT_OUTPUT_FILE_0}"
while [ $COUNTER -lt ${SCRIPT_INPUT_FILE_COUNT} ]; do
tmp="SCRIPT_INPUT_FILE_$COUNTER"
FILE=${!tmp}
cat "$FILE" >> "${SCRIPT_OUTPUT_FILE_0}"
let COUNTER=COUNTER+1
done
.c
input files. You need to keep them up to date, alterantively you can use a file list${DERIVED_SOURCES_DIR}/your.metal
file. if [ $MTL_ENABLE_DEBUG_INFO = "INCLUDE_SOURCE" ]; then
MOREARGS="-gline-tables-only -MO"
else
MOREARGS=""
fi
metal -c -target air64-apple-ios14.0 $MOREARGS -MO -I${MTL_HEADER_SEARCH_PATHS} -F${HEADER_SEARCH_PATHS} -isysroot "${SDKROOT}" -ffast-math -o "${TARGET_TEMP_DIR}/Metal/your.air" -MMD "${DERIVED_SOURCES_DIR}/your.metal"
You may have to massage this a bit for your situation. For example, on macOS I use -target air64-apple-macos10.15
.
See blitcurveMetal.xcodeproj for an example.
You may be wondering, why the ${DERIVED_SOURCES_DIR}
? It’s because in environments where the sourcetree isn’t preserved, like CI, relying on its preservation may break incremental builds.
You may also be wondering, can’t we leverage the Xcode build system a bit more? The short answer is no.
The best way I know of to do this is to build a .a
file, or a script/xcodeproj to build one, and distribute that.
metal-libtool -static "${TARGET_TEMP_DIR}/Metal/mylib.air" -o "${BUILT_PRODUCTS_DIR}/libmylib.a"
${TARGET_TEMP_DIR}/Metal/mylib.air
${BUILT_PRODUCTS_DIR}/libmylib.a
Note that doing this may case DYPShaderDebuggerErrorDomain:1 "Failed instrument library"
, see error discussion below.
…nobody can use your xcodeproj directly, because the xcodeproj is readonly and you will get errors about “The file “project.pbxproj” could not be unlocked” (FB8095945)
Unfortunately, you have to instruct users to
MTLLINKER_FLAGS
to -L ${BUILT_PRODUCTS_DIR} -l mylib
. This assumes that your library has the libmylib.a
naming scheme. Also, this build setting is undocumented.See blitcurve for a complete example.
Prevoius versions of this FAQ did some reverse-engineering of how xcode resolves swiftpm packages in order to calculate where xcode keeps swiftpm projects in terms of various other environment variables.
However, the environment variables can take on a wide range of values (such as when archiving, building as part of a playground, etc.) and I don’t think there is one stable enough to use for this purpose.
FB8102669
- environment variable for swift packagesFB8095382
- environment variable for metal projectsThe best solution I’m aware of on this problem is to create a custom phase for copying the .metallib
into the target manually.
COUNTER=0
while [ $COUNTER -lt ${SCRIPT_INPUT_FILE_COUNT} ]; do
tmp="SCRIPT_INPUT_FILE_$COUNTER"
INPUT=${!tmp}
tmp="SCRIPT_OUTPUT_FILE_$COUNTER"
OUTPUT=${!tmp}
cp ${INPUT} ${OUTPUT}
let COUNTER=COUNTER+1
done
${BUILT_PRODUCTS_DIR}/my.metallib
${BUILT_PRODUCTS_DIR}/${EXECUTABLE_FOLDER_PATH}/my.metallib
FB8276893
I am aware of various “relaxed” behavior of metal math functions, relative to the familiar c or c++ stdlib behavior. For example,
In [Metal] fast math, pow(x,y) is defined to be exp2(y* log2(x))…For x in the domain [0.5, 2], the maximum absolute error is <= 2-22; Otherwise, if x > 0 the maximum error is <= 2 ulp; Or, the results are undefined. Based on hardware, this relaxed definition may cause issues for negative values. To get well defined for negative, one should use metal::precise:pow.
(this limitation is undocumented.)
Should you encounter such behaviors, you may have better results using a precise
function or disabling fno-fast-math
. However, see the discussion for simd_length
and fno-fast-math
below.
FB8904929
simd_length
I am aware of cases where the switching between simd_length
(metal::length
), simd_fast_length
(metal::fast::length
) and simd_precise_length
(metal::precise::length
) seems not to work.
FB8882598
fno-fast-math
In addition to the behavior in simd_length
, I am aware of cases where Metal seems to ignore disabling ffast-math
and continues illegal fp optimization.
FB8880572
Error Domain=MTLCaptureError Code=1 "Capturing is not supported.
Add MetalCaptureEnabled=1
to Info.plist.
Apple documents that this happens automatically, but I’m aware of some cases where it doesn’t.
FB7870713 – works as designed
Generally caused by trying to capture in an unusual environment, like in unit tests. The workaround is to set captureDescriptor.destination = .gpuTraceDocument
rather than the default .developerTools
Note that captures taken in this way may not be replayed against the simulator.
Sometimes functions fail to compile (e.g., at runtime, when you create a PSO). In your application, this usually presents as some CompilerError
. Separately, a crash report for MTLCompilerService
The two CompilerErrors
I’m aware of are
For more information, dig into the appropriate MTLCompilerService crash report. The backtrace may identify a particular affected GPU, as many of these issues are GPU-specific.
May have too many local variables. Interestingly, shader validator appears to work around this issue, for reasons unknown.
Related to control flow analysis on AMD. simplifying control flow may help.
May be related to vertex or fragment functions with very bad typesignatures. One case I’m aware of involves swapping a vertex and fragment function.
Possible duplicate of libigc.dylib: llvm::MemoryDependenceResults::removeInstruction(llvm::Instruction*) + 1390
May be related to use of null pointers on Intel.
May be related to Shader Validator.
For various reasons, the Metal Debugger often fails to attach. Below are some errors and their possible causes.
Can be caused by a GPU abort or IOAF that took place during the capture. Workaround is to not do that.
Has other causes. I believe that generlaly, the debugger is struggling with memory layout. To fix this, simplify the memory layout, at least while you’re trying to debug:
reorder structs
Variant error has “NSCocoaErrorDomain:260” in the subhead.
It appears that Xcode 12.3 / iOS 14.3 has added a diagnostic identifying a particular function at issue:
LLVM ERROR: Undefined symbol: _Z17SadTromboneOpaquev
This seems to be caused by some problem locating sourcecode or debugging info. One reason this can occur is if you are linking in a static library made with metal-libtool
. Interestingly, this can occur even if the symbols in the library are not referenced by the shader being debugged. Maybe referenced by a different shader in the capture, or just in the library itself?
The workaround is to compile in a local version of the library into your metal target.
An additional cause of this issue is calling a __attribute__((pure))
function that is relocated or optimized out by the compiler. To work around this, declare the function without pure, or avoid calling functions that will be optimized out.
Appears to be a hung system process. For me the issue is usually in macOS rather than xcode or on device, so rebooting clears it. Collecting more data on this one.
An additional root cause is performing a metal capture programmatically during application launch. The workaround is to perform the capture “later”, such as with .asyncAfter
.
FB7741457 - works as intended
I am aware of some cases where Xcode will beachball after “build succeeded”. This usually takes progressively longer and longer each time, and may be associated with taking GPU captures. The workaround is to restart xcode.
Xcode can crash when starting a metal debugging session. This usually happens with
KERN_INVALID_ADDRESS at 0x0000000000000020
EXC_CORPSE_NOTIFY
Dispatch queue: GPUShaderDebuggerSession.queue (QOS: USER_INITIATED)
Thread 36 Crashed:: Dispatch queue: GPUShaderDebuggerSession.queue (QOS: USER_INITIATED)
0 com.apple.GPUToolsPlatformSupport-iOS 0x000000015013e6dc 0x14ff6c000 + 1910492
1 com.apple.GPUToolsPlatformSupport-iOS 0x000000015013d884 0x14ff6c000 + 1906820
2 com.apple.GPUToolsPlatformSupport-iOS 0x000000015013e771 0x14ff6c000 + 1910641
3 com.apple.GPUToolsPlatformSupport-iOS 0x000000015013eb99 0x14ff6c000 + 1911705
4 com.apple.GPUToolsPlatformSupport-iOS 0x000000015013ef01 0x14ff6c000 + 1912577
5 com.apple.dt.gpu.GPUDebugger 0x000000012ac4571a -[GPUShaderDebuggerDataSource variablesForExecutionHistoryNode:] + 110
6 com.apple.dt.gpu.GPUDebugger 0x000000012abdc17f __88-[GPUShaderDebuggerSession variableSnapshotsForExecutionHistoryNodes:completionHandler:]_block_invoke + 260
7 com.apple.Foundation 0x00007fff343a9ac5 __NSBLOCKOPERATION_IS_CALLING_OUT_TO_A_BLOCK__ + 7
This is typically caused by some problem understanding shader parameters. In particular, void*
pointers can cause this.
When debugging a Metal shader, sometimes xcode will say a particular variable holds the value n/a
. Usually the lldb pane has the real value of the variable.
IOAF codes are generally something going on in the GPU driver. Apple apparently deliberately does not document what they are, and I get the feeling that for the non-Apple GPUs they are really generated by third-party code. According to apple,
If you’re seeing one, it’s probably an Apple kernel or driver bug and you should file a bug
Generic error, but usually an invalid device load/store
thread may have entered an infinite loop
Generic error, but usually an invalid device load/store
Generic error, but usually an invalid device load/store
Render target had no texture