Skip to content

Conversation

@JFarAur
Copy link

@JFarAur JFarAur commented Mar 24, 2025

Checklist

Which kind of PR do you create?

  • This PR only contains minor fixes.
  • This PR contains major feature update.
  • This PR introduces a new function/api for Qiling Framework.

Coding convention?

  • The new code conforms to Qiling Framework naming convention.
  • The imports are arranged properly.
  • Essential comments are added.
  • The reference of the new code is pointed out.

Extra tests?

  • No extra tests are needed for this PR.
  • I have added enough tests for this PR.
  • Tests will be added after some discussion and review.

Changelog?

  • This PR doesn't need to update Changelog.
  • Changelog will be updated after some proper review.
  • Changelog has been updated in my PR.

Target branch?

  • The target branch is dev branch.

One last thing


Summary

In this PR, the following is addressed for Windows emulation on x86 and x86_64:

  • Fixed several issues preventing C++ runtime initialization.
  • Programs using the C++ runtime and standard input/output streams are emulated with better correctness.
  • Improved support for several C++ runtime features, including typeid and dynamic_cast.
  • Added support for forwarded exports to the PE loader.
    • This fixes many Windows API functions. For example, interlocked SList operations (InterlockedPushEntrySList, etc) are now functioning properly.
    • Hooks on forwarded functions cannot be easily "bypassed" by calls to a forwarding target. For example, calls to RtlAllocateHeap and HeapAlloc will trigger the same Qiling hook.

The following is also addressed, on x86_64 only:

  • Improved support for software user-mode exceptions, including C++ exceptions. Programs making use of try.. catch are emulated with better correctness.

Details

(Original 24 March)
Previously, Qiling hooked some functions in msvcrt.dll (_initterm, _initterm_e and __acrt_iob_func) so that they would return right away. This worked fine for programs using the C runtime library, as the startup code was not necessary when running in emulation. However, this caused some problems for C++ programs, namely programs using C++ runtime DLLs, for example msvcp140.dll. During my testing, a program printing to std::cout or std::err using operator<< was unable to print to the terminal. This was because of some global variables which were necessary for standard streams to function properly, yet were uninitialized because the CRT startup code did not fully complete due to the hooks.

Leaving these functions unhooked, and implementing HeapReAlloc and _realloc_base allowed my test C++ program to run without crashing. Qiling loads the C++ runtime library without crashing, and the program is able to print using the standard input/output streams. However, removing the CRT startup hooks breaks msvcrt.dll DllMain, which crashes when a program that uses msvcrt loads the module.

After some analysis, I have found that the error during CRT startup in programs using msvcrt.dll when Qiling CRT startup hooks are removed is due to a part of the C runtime startup code which obtains a pointer from RtlPcToFileHeader which is invalid.

After implementing a hook for RtlPcToFileHeader in Qiling, the issue is resolved. Both msvcrt and msvcp140 DllMain load without errors, and my test C++ program can successfully print to stdout and stderr. C and C++ programs, even those compiled with recent compilers, initialize CRT successfully and can print to the terminal.

With these additional winapi implementations, the stub hooks for _initterm, _initterm_e and __acrt_iob_func are no longer needed as a stopgap solution, and the library versions of these functions can run without problems.

(Update 27 March)
Currently, Qiling has limited support for software exceptions. Windows programs using SEH or C++ language features such as try.. catch are not emulated correctly.

After some analysis, I have found that the main obstacle in emulating software exception handling in Qiling is in several functions in ntdll which make use of various global structures, caches, or loader data, which are in an invalid state during emulation because Qiling does not fully emulate Windows kernel initialization or the loader process.

The most important functions in question are RtlLookupFunctionEntry and RtlLookupFunctionTable. These functions are used for looking up compiler-generated data used during exception handling, including function locations and stack unwinding instructions.

After implementing hooks for these functions, many of the functions involved with exception handling, including RtlRaiseException, RtlDispatchException, and stack unwinding functions such as RtlVirtualUnwind and RtlUnwindEx, are actually emulated correctly. Handler routines are correctly located and executed.

However, there is another obstacle. After the exception is handled and control is returned to the dispatcher, RtlUnwindEx calls RtlRestoreContext. In the case of C++ exceptions, this function calls RcFrameConsolidation which recursively consolidates stack frames before finally restoring the new context with an IRETQ instruction.

Due to the way Qiling sets up the GDT on x86_64, the IRETQ instruction at this point resulted in CPU faults. After making some adjustments to the GDT setup on x86_64, the context switch occurs as expected. All of the tests (at least those I have access to) are still passing, so it seems this new GDT setup does not break anything for other platforms. I believe it is technically more correct on x86_64, also.

Regardless, in my tests, some simple programs which make use of C++ exceptions are now correctly emulated.

@JFarAur
Copy link
Author

JFarAur commented Mar 24, 2025

Some more information, the _realloc_base hook with proper error checking is necessary because realloc is sometimes used by the CRT with base=NULL, in which case it should behave like a regular malloc. If _realloc_base is not hooked then these allocations will fail.

[FlsGetValue(dwFlsIndex = 0x2) = 0x0]
[SetLastError(dwErrCode = 0)]
[EnterCriticalSection(lpCriticalSection = 0x1806f0e90) = 0x0]
[_realloc_base(block = 0, size = 0x100) = 0x5000071fc]
[memset(dest = 0x5000071fc, c = 0, count = 0x100) = 0x5000071fc]
[_free_base(address = 0)]
[LeaveCriticalSection(lpCriticalSection = 0x1806f0e90) = 0x0]
[api __acrt_iob_func (ucrtbase) is not implemented]

@JFarAur
Copy link
Author

JFarAur commented Mar 24, 2025

Regarding RtlPcToFileHeader, this function is very important because it is used for various things, for example, when setting up SEH during CRT init, in CxxThrowException, and also in the implementations of type ID and dynamic_cast. By implementing it in Qiling, I would expect C++ programs to work a lot better now.

@JFarAur
Copy link
Author

JFarAur commented Mar 24, 2025

I would like to elaborate on my previous comments. After doing some more investigation, I can provide very detailed information about the problems that were occurring previously, and why these changes are beneficial.

First, the _initterm function's job is essentially to loop over an array of function pointers and execute them. For msvcrt.dll, it appears that the only function executed is responsible for initializing the static member std::shared_ptr<__ExceptionPtr> m_badAllocExceptionPtr in __ExceptionPtr:

shared_ptr<__ExceptionPtr> __ExceptionPtr::_InitBadAllocException()
{
    std::bad_alloc _Except;
    return __ExceptionPtr::_CopyException(&_Except, static_cast<const ThrowInfo*>(__GetExceptionInfo(_Except)), false);
}

An std::bad_alloc object is constructed on the stack, then __ExceptionPtr::_CopyException is called to turn this object, together with a ThrowInfo* obtained from __GetExceptionInfo, into an std::shared_ptr<__ExceptionPtr>.

__GetExceptionInfo seems to be a compiler builtin function. In IDA we can see that this becomes a pointer to a static ThrowInfo struct corresponding to the std::bad_alloc class.

loc_110117DA0:
call    ??0exception@@QEAA@AEBQEBDH@Z ; exception::exception(char const * const &,int)
lea     rdi, ??_7bad_alloc@std@@6B@ ; const std::bad_alloc::`vftable'
mov     [rsp+58h+var_28], rdi
xor     r9d, r9d
lea     r8, _TI2?AVbad_alloc@std@@ ; throw info for 'class std::bad_alloc'
lea     rdx, [rsp+58h+var_28]
mov     rcx, rbx

Inside __ExceptionPtr::_CopyException is the following code:

#if _EH_RELATIVE_OFFSETS && !defined(_M_CEE_PURE)
        PVOID ThrowImageBase = RtlPcToFileHeader((PVOID)pTI, &ThrowImageBase);
        PER_PTHROWIB(pExcept) = ThrowImageBase;
#endif

We can see that RtlPcToFileHeader is used to obtain the image base of the module where the ThrowInfo struct is located. This is used to initialize the ImageBase field of the ExceptionRecord.

Previously, without the _initterm stub hooks, this code would run, but ImageBase would be populated with an invalid pointer, which would result in an error at a later point. Below is output from a program that loads msvcrt.dll showing the use of RtlPcToFileHeader and error shortly thereafter.

[!]     api _initterm (msvcrt) is not implemented
[!]     api ??0exception@@QEAA@AEBQEBDH@Z (msvcrt) is not implemented
[!]     api RtlPcToFileHeader (ntdll) is not implemented
[!]     api ZwQueryVirtualMemory (ntdll) is not implemented
[=]     malloc(size = 0xa0) = 0x500009842
[=]     memset(dest = 0x500009882, c = 0, count = 0x58) = 0x500009882
[x]     Error encountered while running msvcrt.dll DllMain, bailing
[=]     Done loading msvcrt.dll

The ntdll implementation of RtlPcToFileHeader looks up the address in the inverted function tables, and calls ZwQueryVirtualMemory. However this syscall is not implemented in Qiling, and RtlPcToFileHeader will return NULL.

With a correct implementation of RtlPcToFileHeader, the CRT startup code can run without problems:

[!]     api _initterm (msvcrt) is not implemented
[!]     api ??0exception@@QEAA@AEBQEBDH@Z (msvcrt) is not implemented
[=]     RtlPcToFileHeader(PcValue = 0x110186998, BaseOfImage = 0x80000001cea8) = 0x110100000
[=]     malloc(size = 0xa0) = 0x500009842
[=]     memset(dest = 0x500009882, c = 0, count = 0x58) = 0x500009882
[=]     EncodePointer(Ptr = 0x110186998) = 0x110186998
...
[=]     atexit(func = 0x110175680) = 0x0
[=]     Returned from msvcrt.dll DllMain
[=]     Done loading msvcrt.dll

This also means that the stub hooks for _initterm and _initterm_e to prevent the crash are unnecessary.

For msvcp140.dll, the situation with _initterm is more complex. The array given to _initterm is larger, and contains dynamic initializers of standard input/output streams such as cout, cerr, and cin. Skipping _initterm means that these initializers do not run, meaning input/output streams are never initialized when programs attempt to use them, resulting in errors, most often null pointer exceptions. This is why programs using the C++ runtime library and standard input/output streams were unstable when running in Qiling.

However, with fixed RtlPcToFileHeader and removal of the _initterm and _initterm_e stubs, the C++ runtime library can correctly call its dynamic initializers, including for the standard input/output streams.

Note that RtlPcToFileHeader is also used to fill in the ExceptionRecord in _CxxThrowException, which is used any time a C++ exception is thrown anywhere:

#if _EH_RELATIVE_OFFSETS
        PVOID ThrowImageBase = RtlPcToFileHeader((PVOID)pTI, &ThrowImageBase);
        ThisException.params.pThrowImageBase = ThrowImageBase;
#endif

And RtlPcToFileHeader is also used in several other important functions in the Microsoft C++ runtime implementation, including _RTDynamicCast and _RTtypeid. Implementing the RtlPcToFileHeader hook allows these functions to work properly. (Well, in the case of CxxThrowException, at least until RtlRaiseException is called, as Qiling does not have support for SEH right now)

@JFarAur JFarAur marked this pull request as draft March 27, 2025 17:15
@JFarAur
Copy link
Author

JFarAur commented Mar 27, 2025

Update: I have since built more on this work, currently I have added some preliminary support for user-mode C++ runtime exceptions to work properly on 64-bit Windows.

I turned this PR into a draft for now. I will update it with more changes soon.

@JFarAur JFarAur changed the title Some fixes to improve execution for Windows binaries that use C++ runtime libraries Improved support for C++ runtime libraries on 64-bit Windows Mar 27, 2025
@JFarAur
Copy link
Author

JFarAur commented Mar 27, 2025

There is an issue currently where C++ exceptions are still not quite working because of the RaiseException hook. If the RaiseException hook is removed then C++ exceptions are emulated correctly, however I am pretty sure this is incompatible with the previous code for setting unhandled exception filters, which I did not want to break.

Likewise, typeid and dynamic_cast are still not 100% there, but they're very close to working properly, I think these can be fixed with maybe another little hook or two.

I will try to figure these out later.

@JFarAur
Copy link
Author

JFarAur commented Mar 28, 2025

After the latest changes, typeid and dynamic_cast are now functioning correctly.

Some of the tests were failing on x86. It turned out that this was because user32 DllMain was proceeding for longer than before since some APIs are now functioning properly, but now it was crashing in a different manner that made the tests fail. I added user32 to the DllMain blacklist because it was expected to fail in the tests before anyways, and allowing it to run at all was causing more issues than was worth to fix.

@JFarAur JFarAur changed the title Improved support for C++ runtime libraries on 64-bit Windows Various fixes and improvements for Windows emulation Mar 28, 2025
@JFarAur
Copy link
Author

JFarAur commented Mar 29, 2025

To resolve the conflict with the existing unhandled exception filter code, a new hook for ZwRaiseException was added which forwards second-chance exceptions to the registered unhandled exception filter. The hook for RaiseException can then be safely removed, as the native functions involved with first-chance exception dispatching are now functioning mostly correctly.

The result of this work is now that C++ exceptions and try.. catch are functioning more or less correctly.

Copy link
Member

@elicn elicn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great contribution, thank you!
Really appreciating the time you took to comment and clarify what the code is doing and why.
Please see my comments and questions.

@elicn
Copy link
Member

elicn commented Mar 30, 2025

Would you be able to create test cases for the exception handling?
(e.g. exception caught, exception caught through a chain, unhandled exception, etc.)

@JFarAur
Copy link
Author

JFarAur commented Mar 30, 2025

Would you be able to create test cases for the exception handling? (e.g. exception caught, exception caught through a chain, unhandled exception, etc.)

I've been working on some test cases, I will add them soon :)

Thanks for looking it over, I will incorporate your feedback and try to answer your questions.

@elicn
Copy link
Member

elicn commented Mar 30, 2025

Just note passthru means the hook implementation is ignored and it used only for its declaration.

@elicn
Copy link
Member

elicn commented Mar 31, 2025

That doesn't seem to be the case in my tests, it seems like Qiling just uses passthru as a flag to not do rewinding after the hook code finishes.

Tagging a hook as a passthru means it is there only to show the call details, and there should not be an alternate implementation. Rather it continues to the real implementation (that is, doesn't patch the program counter so it keeps going to where it meant to go). Having a hook implementation tagged as passthru is kind of a hack - just FYI.

Maybe worth documenting that is your desired behavior, so it won't fail when the code changes in the future.

@elicn
Copy link
Member

elicn commented Mar 31, 2025

Set of tests has been added, they depend on some binaries which I added to the rootfs: qilingframework/rootfs#34

Thanks! We'll need to author test cases for them.
Please be sure to include also the source files and Makefiles in the appropriate directories.

@JFarAur
Copy link
Author

JFarAur commented Mar 31, 2025

Set of tests has been added, they depend on some binaries which I added to the rootfs: qilingframework/rootfs#34

Thanks! We'll need to author test cases for them. Please be sure to include also the source files and Makefiles in the appropriate directories.

Test cases added here: 7dd9fcd
Sources added here: 8f92c73

@xwings
Copy link
Member

xwings commented Apr 1, 2025

Hi @sakura57 thanks for the contribution and welcome to Qiling.

This looks amazing. its a huge PR and i might need sometime to take a look.

as of now i will go ahead to approve the PR in rootfs and enable the CI.

@JFarAur
Copy link
Author

JFarAur commented Apr 1, 2025

Hi @sakura57 thanks for the contribution and welcome to Qiling.

This looks amazing. its a huge PR and i might need sometime to take a look.

as of now i will go ahead to approve the PR in rootfs and enable the CI.

Thanks for the reply, I am looking forward to your feedback. :)

I think I can see why the tests are failing also. There is a problem related to native HeapValidate being called. And there is a slight problem with unhandled exceptions on 32-bit, therefore al khaser is failing. But I think it is fixable without too much work, I will have a look later.

@JFarAur
Copy link
Author

JFarAur commented Apr 2, 2025

After some changes yesterday, wannacry is working properly, and al-khaser is starting. I know why al-khaser is failing, it won't really be a big problem to fix. There is an issue with the clipboard test though.

The clipboard test binary was compiled in Debug mode, and uses debug versions of the C/C++ runtime DLLs. These include a lot of additional error checking. For example, there are debug versions of all the memory functions which allocate a little extra space at the beginning and end of all memory buffers, and fill them with special values. If any changes are detected, then this is taken to mean the heap is corrupted, and the CRT terminates with assertion failures. This is the main reason why the clipboard test is not working right now.

I think a solution for this could be to allow all the native CRT memory functions to run. These depend ultimately on the kernelbase Heap* functions. As long as those implementations are robust, then all the error checking logic should work properly in theory.

The test is still failing though after I tried these changes. I think there are some different possibilities, either the CRT is actually doing its job correctly, and this is actually revealing a buffer overrun occurring somewhere in Qiling hooks. Or, somewhere Qiling is passing allocated memory to the program, which the CRT debug libs are expecting to have the debug header/footer, but it doesn't.

Though, I do think it is slightly strange to include a debug build in the test suites, since they are not suitable for production, even for malware. Not everyone is guaranteed to have the -d versions of the CRT DLLs on their system, they are usually included with Visual Studio. This is another reason the test could ultimately be failing, debug binaries should be really run with the same exact builds of the debug libs they were linked against. @xwings was this intentional to include a debug binary in the tests?

@elicn
Copy link
Member

elicn commented Apr 2, 2025

At the end of the day this is an emulation framework, so we strive to emulate as much as we can rather than simulate. If you notice a working API implementation, we can safely remove the hook -- or better, make it a pass-through stub.

As for the debug versions, we need to support them too. There is no good reason for us to exclude them. We can try to work around problematic APIs, or admit we cannot guarantee a flawless emulation, but we need to support it.

@elicn
Copy link
Member

elicn commented Apr 2, 2025

To ease your debugging work, be sure to use the trace module (example here)

I am attaching here a small script that helped me debug al-khaser back then, by controlling the trace output. File extension changed to allow upload.
run-al-khaser.py.txt

@JFarAur
Copy link
Author

JFarAur commented Apr 2, 2025

My feeling was correct, the CRT assertions were detecting a real buffer overrun. It turned out to be from the LCMapString* functions, this has been addressed in a3bccf5.

I have cleaned up much of the code for msvcrt, keeping in mind to make hooks passthru when native APIs can be used instead of removing them, and allowing the native CRT memory functions to run. This is important for the debug CRT and its additional heap error checking to work correctly.

With these changes the tests are passing again, on my end at least.

For al-khaser, I reverted some of the original hook for RaiseException, on x86 only the unhandled exception filter is called as before. So basically on x86 the situation with exceptions is the same, however on x86_64 the native code for exception dispatching is used and software exceptions are mostly working correctly with the new changes.

@xwings
Copy link
Member

xwings commented Apr 13, 2025

@elicn are we good to merge ?

@elicn
Copy link
Member

elicn commented Apr 14, 2025

@sakura57, is this PR ready for merge, or you are still working on it?

@JFarAur
Copy link
Author

JFarAur commented Apr 14, 2025

@sakura57, is this PR ready for merge, or you are still working on it?

@elicn Ready, I think it's in a good state right now.

it has been getting some "battle testing" the past few weeks as I'm using this branch in my particular use case, haven't noted any problems since the last fixes.

@elicn
Copy link
Member

elicn commented Apr 21, 2025

@xwings we are good to go.

@xwings xwings merged commit d614855 into qilingframework:dev May 6, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants