Tech Editorials #Vulnerability #Windows #Advisory #CVE

From Convenience to Contagion: The Half-Day Threat and Libarchive Vulnerabilities Lurking in Windows 11

Terrynini 2025-02-12

Abstract

In the October 2023 update, Windows 11 introduced support for 11 additional compression formats, including RAR and 7z, allowing users to manage these types of files natively within File Explorer. The enhancement significantly improves convenience; however, it also introduces potential security risks. To support these various compression formats, Windows 11 utilizes the libarchive library, a well-established open-source library used across multiple operating systems like Linux, BSD, and macOS, and in major projects such as ClickHouse, Homebrew, and Osquery.

The libarchive has been continuously fuzzed by Google’s OSS-Fuzz project, making it a time-tested library. However, its coverage in OSS-Fuzz has been less than ideal. In addition to the two remote code execution (RCE) vulnerabilities disclosed by Microsoft Offensive Research & Security Engineering (MORSE) in January, we have identified several vulnerabilities in libarchive through code review and fuzzing. These include a heap buffer overflow vulnerability in the RAR decompression and arbitrary file write and delete vulnerabilities due to insufficient checks of libarchive’s output on Windows. Additionally, in our presentation, we will reveal several interesting features that emerged from the integration of libarchive with Windows.

And whenever vulnerabilities are discovered in widely-used libraries like libarchive, their risks often permeate every corner, making it difficult to estimate the potential hazards. Moreover, when Microsoft patches Windows, the corresponding fixes are not immediately merged into libarchive. This delay gives attackers the opportunity to exploit other projects using libarchive. For example, the vulnerabilities patched by Microsoft in January were not merged into libarchive until May, leaving countless applications exposed to risk for four months. The worst part is that the developers might not know the vulnerability details or even be aware of its existence. To illustrate this situation, we will use the vulnerabilities we reported to ClickHouse as an example to demonstrate how attackers can exploit the vulnerabilities while libarchive remains unpatched.

Introduction

Before the KB5031455 update, Windows 11 only supported ZIP archives natively. In File Explorer, ZIP files are labeled “Compressed (zipped) Folder.” Users can double-click a ZIP file to view its contents:

Or, even better, add new files to the archive or open existing ones directly:

When a user double-clicks a file inside a ZIP archive, File Explorer extracts it to a temporary folder with a randomly generated UUID under the %temp% directory. The file is accessed from this temporary location, and since it’s a temporary file, it will be automatically deleted later:

Compressed Archived Folder

Next, after the KB5031455 update in October 2023, Windows 11 added support for 11 new archive file formats:

This kind of file is labeled “Compressed Archive Folder” by File Explorer:

Curious about how Windows 11 supports these 11 new archive file formats, we began analyzing File Explorer and the related DLL files. The native support for ZIP in File Explorer is handled by zipfldr.dll.

After the KB5031455 update, a new class called ArchiveFolder was added, distinct from the old CZipFolder class used to support ZIP Files.

First Vulnerability: CVE-2024-26185

Before firing up IDA, we first conducted black-box testing on the new “Compressed Archive Folder” feature. When it comes to extracting files, ../ is a timeless trick.

In the first test case, we constructed a file named ..\poc.txt, compressed it into an RAR file, and then uploaded it to a Windows machine to open it by double-clicking. There was no Path Traversal; we only saw an empty folder:

We constructed a file named 123\..\poc.txt in the second test case. Because the 123 was canceled out by .., we only saw the poc.txt solely in File Explore, and still no Path Traversal:

There is also no Path Traversal in the corresponding temp folder:

Excluding “double-clicking,” users will see Extract All if they right-click on “Compressed (zipped) Folder” or “Compressed Archive Folder” in File Explorer. The Extract All will try to decompress the whole archive. Let’s test “Compressed Archive Folder” again with that:

When using Extract All,..\poc.txt is considered an attempt to escape to the parent directory, causing File Explorer to display an error:

The extraction of 123\..\poc.txt is a success, but we still only got the poc.txt.

Because Extract All decompresses the whole archive, we think we should also test the situation when the file name is an absolute path, for example, C:\poc\poc.txt:

There is no error, but the folder C: was renamed to C_. Thus, we now know zipfldr.dll will sanitize the input to avoid Path Traversal or Arbitrary File Write.

But if one “double-clicks” the RAR file, which contains a file with absolute path name, instead of “Extract All,” it will show a Local Disk (C:) folder! The C: isn’t replaced with C_!

Besides that, everything seemed normal, even when we navigated to the innermost folder and opened the poc.txt file:

Except for the fact that the extra poc folder is under our C volume!

That means File Explorer considers here a place to put its temporary files; thus, the poc.txt file is also here, inside the poc folder:

In other words, we have discovered an Arbitrary File Write vulnerability! Since this write operation aims to place temporary files, the files will be deleted after a while. Therefore, what we have actually found is an Arbitrary File Write/Delete vulnerability. But it’s a shame that the permissions used for both writing and deleting are limited to the current user’s privileges.

That’s CVE-2024-26185, a funny yet useless vulnerability. To exploit it to create or delete a file in a specific location, you would need to create the exact same path structure within the archive and then trick the user into opening every folder and double-clicking the target file. Most people would probably find it suspicious halfway through the process.

Well, that may be true, but according to the rules, Microsoft still has to pay me $1,000. Yay!

CVE-2024-26185: Root Cause

The root cause of CVE-2024-26185 is the insufficient filtering of file names. After decompiling the zipfldr.dll, we found it will call replace_invalid_path_chars to sanitize the file name before decompression. The function replaces "*:<>?| with _ and / with \.

Additionally, when interacting with the “Compressed (zipped) Folder” or “Compressed Archive Folder,” users have three methods to extract files, each triggering a different function:

  • Double-clicking a file inside the archive
    • Triggers ExtractFromArchiveByIndex
  • Double-clicking a cmd, bat, or exe file inside the archive
    • Triggers ExtractEntireArchive
  • Right-clicking the archive and selecting “Extract All” from the menu
    • Triggers ArchiveExtractWizard::ExtractToDestination

All of them use archive_read_next_header to get the file names in the archive, replace_invalid_path_chars to sanitize the file name, and ExtractArchiveEntry to actually extract the file. However, they forgot to call replace_invalid_path_chars in ExtractFromArchiveByIndex, which is triggered when “Double-clicking a file inside the archive,” leading to the arbitrary file write and arbitrary file delete vulnerabilities.

CVE-2024-38165: Bypassing the Patch for CVE-2024-26185

After Microsoft patched CVE-2024-26185, we randomly picked some PoCs created a while ago and executed them to check the patch’s correctness. It turns out that some of our PoCs are still working!?

In the patch, they add a replace_invalid_path_chars before ExtractFromArchiveByIndex to sanitize the file name. It looks perfect:

However, it can be easily bypassed by \poc\poc.txt. How does that happen? Let’s follow the code step by step. First, the file name \poc\poc.txt is passed into replace_invalid_path_chars. Since there are no invalid characters in the file name, the output is still \poc\poc.txt:

Next, because zipfldr.dll is currently “extracting file to a temporary folder under the %TEMP% to let users interact with it,” the file name should be concatenated with the path of the temporary folder to construct the destination of extraction:

But here comes the problem, in Windows, C:\ and \ are both considered as root. In other words, zipfldr.dll is currently concatenating two absolute paths! According to the STL implemented by Microsoft, if two arguments of std::filesystem::operator/ are both absolute paths, it will return the second argument directly. Thus, the function’s return value is C:\poc\poc.txt, causing a patch bypass.

Of course, it’s vulnerable without replace_invalid_path_chars. But can we still exploit File Explorer even if replace_invalid_path_chars is used correctly? This function only filters "*:<>?|, meaning . can still be used to construct a remote path. Could NTLM exfiltration still be possible? We attempted to construct paths such as:

  • \\172.23.176.34\Users\nini\Desktop\sharing\test.txt
  • \Device\Mup\172.23.176.34\Users\nini\Desktop\sharing\test.txt

While these are regular files, they only create a corresponding directory under the C: volume (before the CVE-2024-38165 fix). However, if we create a symlink pointing to \\172.23.176.34\poc\poc.txt, when the user either double-clicks the symlink or selects “Extract All,” File Explorer will attempt to communicate with the SMB server at that IP address, leading to an NTLM leak:

Moreover, Windows determines the file type within the archive based solely on the file extension, which can be highly misleading. For example, in this case, our symlink file was recognized as a Text Document:

However, zipfldr.dll uses the CreateSymbolicLinkA API to create a symlink during decompression. Although this API requires elevated privileges, File Explorer won’t prompt for privilege escalation and will simply display an error message instead.

Even though File Explorer adds the SYMBOLIC_LINK_FLAG_ALLOW_UNPRIVILEGED_CREATE flag when using CreateSymbolicLinkA, the documentation states that Developer Mode must be enabled for this flag to take effect.

The “extracting symlink from archive” feature appears to be incomplete, limiting the vulnerability to attacks targeting administrators or developers. As a result, it does not meet MSRC’s threshold for immediate servicing. Therefore, they will not provide ongoing updates on the status of the fix and have closed this case.

Libarchive

In the previous section, we mentioned that zipfldr.dll is responsible for handling interactions with the “Compressed (zipped) Folder” and “Compressed Archive Folder.”

In this section, we’ll talk about the archiveint.dll, which is actually a forked version of libarchive. Libarchive is a powerful, open-source library for handling archive file formats. It is used across multiple operating systems like Linux, BSD, and macOS, as well as in projects such as ClickHouse, Homebrew, and Osquery. Google’s OSS-Fuzz project has continuously fuzzed it 24/7 since 2016, making it a time-tested library.

By black-box testing, we observed several interesting behaviors.

Fun Fact 1: Windows Supports File Formats More Than They Claimed

Although Microsoft claimed to have added native support for the following 11 archive formats in the KB5031455 update, the actual number of supported file formats far exceeds 11.

This is how Windows initialize libarchive in zipfldr.dll:

In fact, the archive_read_support_format_all function in libarchive enables support for a total of 13 archive formats, including ar, cpio, lha, mtree, tar, xar, warc, 7zip, cab, rar, rar5, iso9660, and zip; the archive_read_support_filter_all function in libarchive enables support for a total of 13 filters, including bzip2, compress, gzip, lzip, lzma, xz, uu, rpm, lrzip, lzop, grzip, lz4, zstd.

In addition, format and filer can be used simultaneously. For example, the tar format and gzip filter should be enabled to support the .tar.gz file format. So, the total number of Windows’ natively supported file formats is $13+13+13 \times 13 = 195$ ?

Completely wrong! A maximum of 25 filters can be chained, e.g., archive.rar.gzip.xz.uu.zstd.uu....... That said, Windows 11 actually supports $13+13+13\times 13^{25}$ types of file formats, which equals 91733330193268616658399616035 formats! For free!

As a result, the attack surface has significantly expanded after the update. Any vulnerability in a file format within libarchive can be triggered on Windows. Additionally, parsing multiple filters simultaneously could also introduce security weaknesses.

Fun Fact 2: File Format Confusion

When calling libarchive to decompress files, there is no need to specify the archive’s file format; libarchive automatically determines the format based on the content. However, there is a chance that File Format Confusion can happen when ZIP support is enabled. For example, if we create a demo3.rar archive and place a poc.zip file inside, the result will look like this:

If we double-click demo3.rar directly on Windows, we will find that only the poc2.txt file is visible, while the other folders, files, and even poc.zip are missing. This is because libarchive mistakenly identifies demo3.rar as a ZIP file!

To understand the bug, let’s see how libarchive determines the file format for an archive. In choose_format, the bid function of the enabled formats is called, and the format whose bid returns the highest value is the one libarchive will treat the file as.

Take ar’s bid function, archive_read_format_ar_bid, for example, if the beginning 8 bytes of an archive is "!<arch>\n", the function will return 64, which should be derived from $8\ bits \times 8\ bytes = 64$:

Next, the RAR format’s highest score is only 30. Although it’s unclear how the number was determined, if each file format is checked starting from the beginning of the file, it should be difficult to create a polyglot that would break this mechanism, right?

But, there is a special mode for ZIP in libarchive: seekable. In other words, the ZIP signature does not need to be at the beginning of the file; libarchive will search for it itself. The highest value for a seekable ZIP is 32:

The consequence is that when a RAR archive contains a ZIP file and the RAR’s compression ratio is low enough to leave the ZIP signature untouched, libarchive will incorrectly treat the RAR file as a ZIP file.

Fun Fact 3: Sometimes, Libarchive Tries to Spawn an External Executable

By reviewing the source code of libarchive, we found that if some libraries are missed during compiling, libarchive will change its behavior from using the library to executing commands to decompressing the archive:

Then we decompiled the archiveint.dll, which is the forked libarchive on Windows. We confirmed that the function for decompressing some file formats will try to execute the external binary, e.g., lzop_bidder_init:

Plus, “libarchive decides which format to use for extraction based on the file content,” all we need to do is change the extension of an lzop compressed file to .rar, and double-click it to trigger the corresponding lzop extraction function lzop_bidder_init:

At the end, we’ll see that explorer.exe is trying to execute lzop in PATH to decompress the archive:

RCEs Reported by MORSE: CVE-2024-20696 and CVE-2024-20697

So far, it’s clear that libarchive introduces numerous attack surfaces. In fact, Windows also patched two vulnerabilities reported by Microsoft’s Offensive Research Security Engineering (MORSE) team in January 2024, specifically the RCE vulnerabilities CVE-2024-20696 and CVE-2024-20697.

CVE-2024-20696: OOB Write in copy_from_lzss_window_to_unp

While extracting RAR files, the fourth argument of copy_from_lzss_window_to_unp, length, is calculated based on the state after lzss decompression, representing the copy length. However, it was incorrectly defined as an int. This mistake allows an attacker to manipulate the lzss data, causing length to become a negative value, bypassing validation checks, and resulting in an out-of-bounds write vulnerability.

CVE-2024-20697: OOB Write in execute_filter_e8

It’s also a vulnerability that happens when extracting RAR files. If an RAR file contains an e8 filter, libarchive will run into the execute_filter_e8 function. (the filter here is defined by RAR, not the filter of libarchive we mentioned earlier) The problem is that although there is a check in execute_filter_e8 to ensure the variable length is larger or equal to 4, the length will be used in a for loop for length-5. So, when the length is 4, the loop will run 0x100000000 times, causing an out-of-bounds write.

Fuzzing: Why OSS-Fuzz Never Found These?

To reproduce these two CVEs, we must construct RAR files, which is time-consuming. In CVE-2024-20696, we have to construct data of lzss that causes the length to become a negative number; in CVE-2024-20697, we have to put an e8 filter in an RAR archive. Instead of building RAR bytes by bytes, we chose to collect RAR archives, especially those with the e8 filter, and feed them to the AFL++ fuzzer. To our surprise, it only took 56 seconds to find the first crash for CVE-2024-20697:

It’s great that the crash happened quickly, but here’s the big problem: OSS-Fuzz has been Fuzzing libarchive 24/7 since at least 2016, so how can a vulnerability found in 56 seconds not be discovered yet?

From the OSS-Fuzz summary for libarchive, we can see that in June 2024, the code coverage of libarchive was only 15.03%:

And seems it has lasted a long time:

From the file view, it is obvious that some file formats are basically untested. For example, the coverage of archive_read_support_format_rar.c is only 4.07%:

The Answer Revealed

While preparing our talks, we noticed a pull request in the libarchive repository. It turns out that though they enabled the DONT_FAIL_ON_CRC_ERROR flag for CMake while compiling libarchive for OSS-Fuzz, they didn’t define that option in CMake!?

The DONT_FAIL_ON_CRC_ERROR flag allowed libarchive to continue processing a file even when the CRC check failed. As we all know, fuzzers are generally poor at generating correct checksums. This means that the long-term low coverage of OSS-Fuzz was due to the fuzzer’s inability to produce valid CRC values to pass the checks.

After the fix, a significant improvement in OSS-Fuzz’s coverage of libarchive can be observed, increasing from 15.03% to 63.10%:

From the file view, the code coverage for individual file formats is also improved:

Keep Fuzzing

While conducting the code review, we kept AFL++ working for us. At the end of our research, the fuzzer found two out-of-bound read vulnerabilities: CVE-2024-48957 and CVE-2024-48958.

CVE-2024-26256: Libarchive Remote Code Execution Vulnerability

After analyzing the vulnerabilities, CVE-2024-20696 and CVE-2024-20697, found by Microsoft, we found they are both in the archive_read_support_format_rar.c and both in the newly added functions within the previous three years.

We decided to review RAR to investigate whether there are any other vulnerabilities. The first thing we wanted to understand was what filter_e8 refers to in CVE-2024-20697 and what the “filter” mentioned in the commit message “support RAR filters” means.

The VM in RAR

To understand filter_e8, we must know that there is a VM in RAR! There is actually a register-based VM in RAR. The VM can be used to run a custom program to improve the compression ratio of an RAR file.

When creating an RAR file, a custom “filter” program can be included. The filter_e8 is one such program, designed to improve the compression ratio for Intel binaries. The “e8” in its name refers to the near call opcode in the Intel instruction set.

But what is the correlation between call instruction and improved compression ratio?

Take the following small program as an example: if there are two call instructions that will call funcA, located at addresses 0 and 0x10 respectively, we can see in the machine code that the near call instruction starts with 0xe8, followed by 4 bytes, which corresponds to the rel32 in the manual. The rel32 value represents the relative offset between the address of the target function and the address of the instruction following the call instruction. For example, the rel32 value of the first call instruction is 0x1b, which is calculated by subtracting the address of the next instruction (0+5) from the address of the target function (0x20):

Both instructions are used to call funcA, but due to their different machine code representations, the compression ratio is lower. However, since the rel32 value can be easily calculated, it can be replaced with the absolute address of the target function, making the instructions identical and improving the compression ratio:

While decompressing, the e8 filter will be used to recover the replaced rel32:

To simplify the implementation, libarchive doesn’t fully implement the entire VM. Instead, it simply calculates the fingerprint of the filter using crc32. The relevant code can be found in libarchive:

static int
execute_filter(struct archive_read *a, struct rar_filter *filter, struct rar_virtual_machine *vm, size_t pos)
{
  if (filter->prog->fingerprint == 0x1D0E06077D)
    return execute_filter_delta(filter, vm);
  if (filter->prog->fingerprint == 0x35AD576887)
    return execute_filter_e8(filter, vm, pos, 0);
  if (filter->prog->fingerprint == 0x393CD7E57E)
    return execute_filter_e8(filter, vm, pos, 1);
  if (filter->prog->fingerprint == 0x951C2C5DC8)
    return execute_filter_rgb(filter, vm);
  if (filter->prog->fingerprint == 0xD8BC85E701)
    return execute_filter_audio(filter, vm);

  archive_set_error(&a->archive, ARCHIVE_ERRNO_FILE_FORMAT, "No support for RAR VM program filter");
  return 0;
}

After RAR v5, filters became an enum type in the file format, making it possible to use only pre-defined filters. This can be observed in the UnRAR source:

enum FilterType {
  // These values must not be changed, because we use them directly
  // in RAR5 compression and decompression code.
  FILTER_DELTA=0, FILTER_E8, FILTER_E8E9, FILTER_ARM, 
  FILTER_AUDIO, FILTER_RGB, FILTER_ITANIUM, FILTER_TEXT, 
  
  // These values can be changed.
  FILTER_LONGRANGE,FILTER_EXHAUSTIVE,FILTER_NONE
};

Code Review

The concept of filters is interesting, and the vulnerabilities that Microsoft found are also related to the filter. So, we conducted a code review of archive_read_support_format_rar.c.

After some time, we discovered a heap buffer overflow vulnerability in copy_from_lzss_window. The length parameter of copy_from_lzss_window is used directly in memcpy without any checks, while the buffer size is only 0x40004 bytes:

From the places where copy_from_lzss_window is called, it can be observed that the function is used to copy data into the VM memory:

The vulnerability itself is straightforward, but constructing a valid filter and data is a bit more complex. These elements are not immediately presented in the RAR file format, they are actually part of the data. Additionally, the data is encoded using Huffman coding, and it isn’t consumed byte by byte, but rather 7 bits at a time. Since this is not the focus of this article, we won’t delve into the details here, but we encourage readers to attempt reproducing the vulnerability.

The vulnerability is CVE-2024-26256. The reason it was not detected by fuzzing is straightforward: the data must be exactly the same size as the value of filter->blocklength. However, fuzzers often trim files when their coverage is similar.

Half-Day: A 1-Day That Looks Like a 0-Day

When we mentioned earlier the two vulnerabilities the Microsoft Offensive Research Security Engineering (MORSE) team reported, we said that constructing a PoC is a little complex. Perhaps someone immediately thought, “Why not check the GitHub repository of libarchive for test cases or commit messages?” Well, we did look—there was nothing.

Because, at that time, libarchive hadn’t even been patched yet, or perhaps no one even knew the vulnerabilities existed! We can see that Microsoft had already fixed the libarchive fork used in Windows back in January:

However, the corresponding two patches for libarchive were merged in May and April, respectively:

Wouldn’t it mean that anyone who has been closely following Windows patches would immediately discover that there were two unpatched vulnerabilities in the libarchive upstream? These vulnerabilities would be considered 1-day for the Windows forked version of libarchive, because they have been discovered and patched. However, for libarchive upstream, they are 0-day, as no patch has been made, and the maintainers may even be unaware of the issue! We will refer to this situation as “0.5-day” or “Half-day” in the rest of the article.

So, we began searching for large projects that use libarchive. We wanted to simulate a “Half-day attack” scenario and also believed that vendors incorporating libarchive in their software would be more willing to help us urge libarchive to patch the vulnerabilities.

Attacking ClickHouse

After some investigation, we discovered that ClickHouse uses libarchive for decompression, which likely contains the vulnerable code. In ClickHouse, we can interact with the data inside the archive through the file table engine:

However, the manual also mentions that ClickHouse only supports zip, tar, and 7z file formats:

But is that really the case? Aside from zip, both tar and 7z in ClickHouse are implemented by TarArchiveReader and SevenZipArchiveReader, both of which inherit from LibArchiveReader. The behavior of opening files with LibArchiveReader is implemented in the open function. In the source code, you can see the familiar pattern:

    static struct archive * open(const String & path_to_archive)
    {
        auto * archive = archive_read_new();
        try
        {
            archive_read_support_filter_all(archive);
            archive_read_support_format_all(archive);

Yes, ClickHouse also uses the archive_read_support_format_all and the archive_read_support_filter_all to initialize the libarchive, which means we can trigger the vulnerabilities relative to RAR! All we need to do next is have ClickHouse decompress the files for us. Although the current decompression feature allows direct access to files on S3, it could not be used this way at the time:

So, we must upload the file first. The following query will create a new table, which will be stored as a file:

INSERT INTO TABLE FUNCTION
file('poc.7z', 'Native' ,'column1 String') VALUES ('payload')

Even so, the file generated this way would include the table’s metadata. With the metadata present, we cannot make libarchive treat it as a RAR file, thus preventing the vulnerability from being triggered:

00000000: 0101 0763 6f6c 756d 6e31 0653 7472 696e  ...column1.Strin
00000010: 6707 7061 796c 6f61 64                   g.payload

We needed to look for formats in ClickHouse’s Output Data Formats that dont’ have metadata at the beginning of the file. We decided to use TabSeparatedRaw because in TabSeparatedRaw:

  • Data is stored row by row.
  • Data within a row is separated by tabs.

For example, if using the following two queries:

INSERT INTO TABLE FUNCTION
file('test.7z', 'TabSeparatedRaw', 'column1 String')
VALUES ('row1 string')

INSERT INTO TABLE FUNCTION
file('test.7z', 'TabSeparatedRaw', 'column1 String')
VALUES ('row2 string')

The content of the generated file would be:

row1 string
row2 string

Nonetheless, there is a constraint: the data cannot contain tabs or newlines. If we can overcome this, we can construct a valid RAR file! So, how can we avoid using tabs or newlines? It sounds complicated, but don’t forget that ClickHouse calls archive_read_support_filter_all to enable all filters for us! The one that fits the TabSeparatedRaw the most is UUencode. Data that has been UUencoded would look like this:

begin 600 exploit.rar.uu
M4F%R(1H'`,^0<P``#0````````!287(A&@<`SY!S```-`````````%VI=("0
M,0"VXP0`4,\+``(2'^X4O1EY5QTU#``@````;7-V8W(Q,#`N9&QL`/#<ZFP8
M(AE0S(EB'!(",EM(4I0M-"H*"T6JBHJ!1$1T%IKHFBA0H*:(VV2VP)98R9+*
M"*`T'I;&]1>J]>HZ!>KU4=(ZQTC:4VHB@J4WH2D-%--VSX9S);9IOO.9HF2E
MH47O?/GG[Y^6LSGO/>>Z>>\]YJS!N3_(^_P3SS"O\S`P+6<_V'P(,#`P,#`B
M.EX#-(,!V$Z07A+LVZ?4@K:RB=`NN<+9(IB<?NX``Y[L`3`P,#``,`'/YCL*

Finally, all we need to do next is UUEncode the original RAR payload and then upload it:

INSERT INTO TABLE FUNCTION
file('poc.7z', 'TabSeparatedRaw', 'column1 String') VALUES (uu_encoded_rar_payload)

Ask ClickHouse to decompress it for us:

SELECT * FROM file('poc.7z :: **', RawBLOB)

That would successfully trigger the out-of-bounds write vulnerability:

We reported those issues to the Bug Bounty program of ClickHouse on Bugcrowd:

ClickHouse quickly fixed the issue and was willing to help us urge libarchive to patch the vulnerabilities! While it’s unclear whether Microsoft informed the libarchive maintainers about the CVE-2024-20696 and CVE-2024-20697 vulnerabilities (since there’s no public information, and the Security Advisories on libarchive’s GitHub repository have no relevant details), as we mentioned earlier, these two vulnerabilities, initially discovered by Microsoft in the forked version of libarchive, were eventually patched in libarchive in May and April, ending the awkward “Half-day” situation.

Issue Tracking

In addition to reporting the two “Half-day” vulnerabilities mentioned earlier, don’t forget that we also reported three other 0-day vulnerabilities, each of which is as follows:

  1. RCE fixed by Microsoft: CVE-2024-26256
    • Reported to libarhicve on 4/27
    • Fixed on 8/14
    • Closed on 9/28
  2. OOB read in filter_audio
    • Reported to libarhicve on 3/20
    • Fixed on 4/29
    • Closed on 9/28
  3. OOB read in filter_delta
    • Reported to libarhicve on 3/20
    • Fixed on 4/29
    • Closed on 9/28

After several months following the reports, these vulnerabilities were finally patched, by which time it was already September.

The “Half-day” Cycle of Repetition

The most severe one is CVE-2024-26256, a vulnerability we reported to Microsoft, which was already patched on Windows in April.

When we reported the vulnerability CVE-2024-26256 in March, we asked MSRC whether they would submit the patch to libarchive’s GitHub repository, but we didn’t receive a response initially. After Microsoft patched CVE-2024-26256, we followed up to confirm if they had shared the vulnerability information with libarchive maintainers. MSRC replied, “If you wish, we encourage you to open a separate GitHub issue.” To avoid a “Half-day” situation, we immediately created an issue in libarchive’s Security Advisory after receiving their message:

However, the “Half-day” situation still occurred due to a lack of response. In July, we submitted a PR to port Microsoft’s patch to libarchive. While we weren’t sure if this was the best fix, it was certainly better than leaving the issue unaddressed. As a result, history repeated itself, and we found ourselves stuck in a “Half-day” scenario again, from April until the patch was finally completed in September.

The Remaining Two 0-Days

Attentive readers may have already noticed that the issues we reported are not listed under the “Published” tab but rather under “Closed”:

In addition to CVE-2024-26256, which we mentioned earlier, the other two vulnerabilities are out-of-bounds read issues that had not been assigned CVE identifiers at the time. When a patch is linked to an existing CVE number, it is generally understood to be a security fix. However, the other two vulnerabilities were not publicly disclosed. Given that libarchive is widely used across many software applications and services, many users may be unaware they are relying on it. The dependency chains within such software can be large and intricate. If developers or end-users do not recognize that a patch addresses a security issue, the fix may propagate slowly through the dependency chain, significantly increasing the risk exposure.

As a result, after confirming that libarchive had closed the issue, we promptly applied for CVE identifiers for these vulnerabilities. The two vulnerabilities were assigned CVE-2024-48957 and CVE-2024-48958. By the time these were published, it was already October, six months after the patch had been released in April.

Conclusion

This article discusses the vulnerabilities and notable characteristics introduced when Windows adopted libarchive to support additional archive file formats.

We also successfully exploited what we consider “Half-day” vulnerabilities in ClickHouse. These “Half-day” vulnerabilities arise from the fact that after Windows forked libarchive and compiled it into the closed-source archiveint.dll, it failed to promptly inform the libarchive maintainers or contribute the patch back to the upstream repository, leading to the creation of the “Half-day” vulnerability.

The delayed fix in the upstream repository can be attributed to communication delays and the absence of a publicly available patch. The maintainers were only able to address the issue after receiving the report, by which point the forked version had already been patched. Therefore, after patching its forked version of libarchive, Microsoft should have not only notified the original maintainers but also submitted a Pull Request to the upstream repository to facilitate the fix.

Libarchive maintainers are volunteers who may be unpaid. The open-source ethos encourages everyone to “share, collaborate, and contribute” (and much more). Thus, we believe that researchers should not only provide vulnerability analysis and PoCs but also actively propose fixes to help preserve the security and quality of open-source software when reporting vulnerabilities.