External Third-party Resources and Your (Web) Application

In first quarter of 2018 we were involved in numerous security assessments of web applications with specific threat models either due to their place in the infrastructure (intranet-only) or general sensitivity (data). All of these applications have been delivered by different software vendors but what’s interesting is that there was one security issue that was prevalent across all of them: Inclusion of resources from external third-parties (majority of which were different versions of jQuery and media files).

This alone is a minor security issue for non-sensitive Internet facing systems and its risk is usually accepted by the business, e.g. common case is a CDN usage where performance pros outweigh security cons. However, for applications that are designed to be intranet-only or are very sensitive —think Level 3 of OWASP ASVS— the severity is higher, up to the point where I could rate it high in case of certain systems.

Tell me more

For example think about an internal web application for project management that (1) is accessible by all employees remotely through VPN, and (2) makes use of an external JavaScript code. In this particular scenario attacker cannot attack the system itself because it’s not visible to the outside world, however since this system is embedding resources from external third-party then this party can be targeted in order to attack its users (via malicious client-side JavaScript) and later the system itself. In this case I would rate the severity at least as medium.

Another example is a web application that handles sensitive data such as any banking website. Would you really want to execute JavaScript from an external resource while you’re doing sensitive operations that involve money? Of course not.

On a meta-level it does not really matter what type of resource we are talking about, from architectural perspective following examples are the same thing:

  • Media files, CSS, and JavaScript code in case of a web application (security posture of your provider has an impact on you, moreover security of the communication channel used to transfer the resource is now relevant. See CWE-1016CWE-829, CWE-830);
  • Dynamic libraries in case of desktop and mobile applications (all vulnerabilities of these libraries are now your problem, you also need to make sure you’re loading what you think you’re loading. See CWE-417, CWE-426, CWE-427, and When third-party components become a source of all evil).

The risk associated with these examples is different but the pattern is same.

I will verify the resource

Yes, in case of web applications you could use Subresource Integrity (SRI) to verify if external resource is in fact what it should be. However, it does not solve the root of your problem which is internal or sensitive system that incorporates untrusted external third-parties, it only lowers the risk for your users to a certain degree.

Additionally, as-is there are two major issues with SRI:

  • It only applies to JavaScript (<script> tag) and CSS (<link> tag) hence if we embed an external media file (e.g. image with <img> tag) then the integrity of it will not be validated so it can still be used for attacking the users (e.g. via 1-day memory corruption in image parsing code);
  • Adoption across desktop and mobile ecosystems is not full yet.

Figure 1. Support for desktop browsers via MDN

Figure 2. Support for mobile browsers via MDN

Figure 3. Support for all browsers via CanIUse.com

Therefore, in my opinion SRI is not a complete solution to the problem until it (1) is able to handle all elements (note in section 3.4 of SRI specification suggests that it is likely to happen), and (2) is supported by all major desktop and mobile browsers (should happen in reasonable timeframe).

I trust my providers

By its nature trust is a transitive relation: If Party A trusts Party B (explicitly), and Party B trusts Party C (explicitly), then Party A also trusts Party C (implicitly) even though Party A might not be aware about existence of Party C.

Figure 4. Transitivity of trust

Note that this chain of trust could be longer, e.g. if JavaScript from Party C would interact with further external third-parties.

Having said that, the problem is clear: When you make use of resources from untrusted external third-parties — and all of them are untrusted to some degree — you never know how long the chain of trust really is and have no control over it (other than complete removal).

Conclusion

Whenever an application makes use of external third-party resource it incorporates all risks associated with the resource itself and its provider, that’s why applications should limit their usage of external third-party resources to the absolute minimum that is justified by the business case. Security is always a trade-off, think about your threat model before going the easy route.

Additional Reading

Tracking Input with DTrace on OS X

When performing reverse engineering whether for vulnerability research or malware analysis at some point you will need to track input data. Usually one would start from the point of entry of the input and follow the code flow from there. This can be achieved via debugger: setting BPs on interesting points and slowly moving through each break inspecting arguments and return values of called functions. On top of that, if our debugger of choice supports scripting then we can try to automate this process. In this post I will focus on introducing similar automated functionality with DTrace for OS X platform (some minor tweaks may be required for other supported platforms).

Plan of action

Our solution will be simple. We will try to mimic what we would normally do under the debugger and to achieve this we will put DTrace probes at places that are interesting to us. For the following article these are open() and read() functions from the C library. Of course the more you know about your targeted application the better choice you will make.

To list all functions that can be probed run $ sudo dtrace -ln 'pid$target:::entry {}' -c .

Our plan of action can be summarised as:

1. Enumerate interesting functions
2. Set-up probes at points of interest
3. Dump the data
4. Look for known input

For this article we already have our targets but if that would not be the case, we could for example trace open() system call via syscall provider and dump its first argument along with user callstack. Usually this would give us nice overview of the code flow. For setting-up probes we could also utilize syscall provider, however I will use pid provider to gain better performance. Data dumping is done via tracemem on appropriate pointers used by read(). Finally, looking for known input is done by the machine operator — you.

DTrace does not support loops or actual if statements (probe predicates and ternary operator do not count) and that is why we cannot fully automate our script, hence requirement for manual inspection.

This approach is somewhat similar to what Peter and Brandon did in MindshaRE couple of years ago. But, as opposed to Peter we do not need to manually patch any particular function, just observe at the point of entry/return which is similar to Detours mentioned in the comments section of his post.

Implementation

First of all we want to probe entry of the open() function along with a predicate on the file that is interesting to us:

pid$target::__open:entry
/copyinstr(arg0) == "/Users/ad/Desktop/test.mp3"/
{
self->fname = copyinstr(arg0);
self->openok = 1;
}

The only actions we are taking inside of this probe are setting thread-local variables self->fname and self->openok which we will use in our next probe:

pid$target::__open:return
/self->openok/
{
trackedfd[arg1] = 1;
printf("Opening %s with fd %#xn", self->fname, arg1);
self->fname = 0;
self->openok = 0;
}

As you can see, the probe is set on return of the open() and we are using self->openok variable as a condition to make sure we are in a proper open() return (execution wise). Inside of the probe we are doing couple of things:

  • Setting a flag for opened file descriptor inside of the global array trackedfd[] (arg1 holds return value)
  • Printing out logging information
  • Freeing variables

After this we are ready to monitor any function that makes use of marked file descriptor. In our case this function is read():

pid$target::read:entry
/trackedfd[arg0] == 1/
{
self->rfd = arg0;
self->rbuf = arg1;
self->rsz = arg2;
}

pid$target::read:return
/self->rfd/
{
printf("Reading from fd %#p to buf %#p size %#xn", self->rfd, self->rbuf, self->rsz);
tracemem(copyin(self->rbuf, arg1), 64);
ustack(); printf("n");
self->rfd = 0;
self->rbuf = 0;
self->rsz = 0;
}

Probe set on entry of read() should be self-explanatory by now. The probe set on return does logging, dumping of read()'s destination buffer, and displaying user-mode callstack.

As a last step we will zero-out file descriptor flag stored by trackedfd[] array in close() function:

pid$target::close:entry
/trackedfd[arg0] == 1/
{
trackedfd[arg0] = 0;
}

After putting it all together we get the following script:

#!/usr/sbin/dtrace -s

#pragma D option destructive
#pragma D option quiet

BEGIN
{
trackedfd[0] = 0;
}

pid$target::__open:entry
/copyinstr(arg0) == "/Users/ad/Desktop/test.mp3"/
{
self->fname = copyinstr(arg0);
self->openok = 1;
}

pid$target::__open:return
/self->openok/
{
trackedfd[arg1] = 1;
printf("Opening %s with fd %#xn", self->fname, arg1);
self->fname = 0;
self->openok = 0;
}

pid$target::read:entry
/trackedfd[arg0] == 1/
{
self->rfd = arg0;
self->rbuf = arg1;
self->rsz = arg2;
}

pid$target::read:return
/self->rfd/
{
printf("Reading from fd %#p to buf %#p size %#xn", self->rfd, self->rbuf, self->rsz);
tracemem(copyin(self->rbuf, arg1), 64);
ustack(); printf("n");
self->rfd = 0;
self->rbuf = 0;
self->rsz = 0;
}

pid$target::close:entry
/trackedfd[arg0] == 1/
{
trackedfd[arg0] = 0;
}

You can see that I have silently added 2 #pragmas, you can read about them here. I have also used BEGIN clause to initialise global array trackedfd[].

Usage example

For a quick and simplified example of tracing I will use VOX music player which is freely avilable on the Mac App Store, so without further ado:

Wed May 13 08:24 PM ttys008 [ad@mbp ~]
$ sudo ./fileinput.d -p 31337 > VOX.trace
^C

Wed May 13 08:24 PM ttys008 [ad@mbp ~]
$ less VOX.trace

Opening /Users/ad/Desktop/test.mp3 with fd 0x15
Opening /Users/ad/Desktop/test.mp3 with fd 0x15
Reading from fd 0x15 to buf 0x111fda108 size 0x1000

0 1 2 3 4 5 6 7 8 9 a b c d e f 0123456789abcdef
0: 49 44 33 03 00 00 00 00 23 76 54 49 54 32 00 00 ID3.....#vTIT2..
10: 00 1b 00 00 00 54 72 61 76 65 6c 65 72 20 69 6e .....Traveler in
20: 20 74 68 65 20 57 6f 6e 64 65 72 6c 61 6e 64 54 the WonderlandT
30: 59 45 52 00 00 00 05 00 00 00 32 30 30 35 54 50 YER.......2005TP

libsystem_kernel.dylib`read+0x14
libbass.dylib`BASS_ErrorGetCode+0x1e1

[ ... ]

We seem to successfully tracked our input but the callstack does not look good (seems too small). Disassembling libbass.dylib and jumping to BASS_ErrorGetCode+0x1e1 results in the following code:

IDA bad read

This code chunk is unusual. It does not contain any references (that is why IDA fails to recognise it as a function) and it lacks function prologue (that is why DTrace fails to display full callstack). Most probably it is a dynamic call, we can verify this assumption by inspecting the application inside of lldb:

(lldb) attach -p 31337
Process 31337 stopped
* thread #1: tid = 0x250206, 0x00007fff977ad4de libsystem_kernel.dylib`mach_msg_trap + 10, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
frame #0: 0x00007fff977ad4de libsystem_kernel.dylib`mach_msg_trap + 10
libsystem_kernel.dylib`mach_msg_trap:
->  0x7fff977ad4de : ret
0x7fff977ad4df : nop

libsystem_kernel.dylib`mach_msg_overwrite_trap:
0x7fff977ad4e0 : mov r10, rcx
0x7fff977ad4e3 : mov eax, 0x1000020

Executable module set to "/Applications/VOX.app/Contents/MacOS/VOX".
Architecture set to: x86_64-apple-macosx.
(lldb) image list

[ ... ]

[230] 0x000000010186d000 /Applications/VOX.app/Contents/Frameworks/VXBass.framework/Versions/A/libbass.dylib

[ ... ]

(lldb) b 0x00000001018757ae
Breakpoint 1: where = libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4, address = 0x00000001018757ae
(lldb) c
Process 31337 resuming
Process 31337 stopped
* thread #18: tid = 0x2504d9, 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4, stop reason = breakpoint 1.1
frame #0: 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4
libbass.dylib`___lldb_unnamed_function122$$libbass.dylib:
-> 0x1018757ae : mov rax, rdx
0x1018757b1 : mov edx, esi
0x1018757b3 : mov rsi, rdi
0x1018757b6 : mov edi, eax
(lldb) x/80x $rdi
0x1087a9108: 0x00000000 0x00000000 0x00000000 0x00000000
0x1087a9118: 0x00000000 0x00000000 0x00000000 0x00000000
0x1087a9128: 0x00000000 0x00000000 0x00000000 0x00000000
0x1087a9138: 0x00000000 0x00000000 0x00000000 0x00000000

[ ... ]

[ ... ]

[ ... ]

(lldb) c
Process 31337 resuming
Process 31337 stopped
* thread #18: tid = 0x2504d9, 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4, stop reason = breakpoint 1.1
frame #0: 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4
libbass.dylib`___lldb_unnamed_function122$$libbass.dylib:
-> 0x1018757ae : mov rax, rdx
0x1018757b1 : mov edx, esi
0x1018757b3 : mov rsi, rdi
0x1018757b6 : mov edi, eax
(lldb) x/80x $rdi
0x1087b2199: 0x00000000 0x00000000 0x00000000 0x00000000
0x1087b21a9: 0x55555500 0x55555555 0x55555555 0x55555555
0x1087b21b9: 0x54474154 0x65766172 0x2072656c 0x74206e69
0x1087b21c9: 0x57206568 0x65646e6f 0x6e616c72 0x00000064
0x1087b21d9: 0x73755300 0x20756d75 0x6f6b6f59 0x00006174
0x1087b21e9: 0x00000000 0x00000000 0x00000000 0x53000000
0x1087b21f9: 0x6f626d79 0x0000006c 0x00000000 0x00000000
0x1087b2209: 0x00000000 0x00000000 0x00000000 0x30303200
0x1087b2219: 0x20202035 0x20202020 0x20202020 0x20202020
0x1087b2229: 0x20202020 0x20202020 0x20202020 0x0c030020

[ ... ]

(lldb) bt
* thread #18: tid = 0x2504d9, 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4, stop reason = breakpoint 1.1
* frame #0: 0x00000001018757ae libbass.dylib`___lldb_unnamed_function122$$libbass.dylib + 4
frame #1: 0x0000000101879559 libbass.dylib`___lldb_unnamed_function171$$libbass.dylib + 163
frame #2: 0x00007fff95f36268 libsystem_pthread.dylib`_pthread_body + 131
frame #3: 0x00007fff95f361e5 libsystem_pthread.dylib`_pthread_start + 176
frame #4: 0x00007fff95f3441d libsystem_pthread.dylib`thread_start + 13

Following missing callstack entry leads us to a dynamic function call located at libbass+0xc556:

loc_10187951F:
mov     ecx, [rbp+0F8h]
mov     eax, [rbp+0FCh]
add     rax, [rbp+100h]
mov     edx, ecx
mov     rbx, rdx
xor     edx, edx
div     rbx
mov     eax, edx
sub     ecx, edx
cmp     esi, ecx
mov     r12d, ecx
cmovbe  r12d, esi
mov     rdx, [rbp+50h]
cdqe
lea     rdi, [r14+rax]
mov     esi, r12d
call    qword ptr [rbp+40h] ; dynamic call into read wrapper
mov     ebx, eax
cmp     eax, 0FFFFFFFFh
jnz     short loc_1018

This seems to be the function that dispatches data reads and probably somewhere down the road parsing is taking place.

Also you have probably noticed that callstack from lldb, although better than from DTrace’s output, is still poor. It is the result of pthreads usage which cripples our dynamic analysis.

Conclusions

We successfully found the code responsible for input entry and from there we could start more tailored tracing operation in order to find code responsible for parsing data. The real strength of DTrace lies in a fast ad-hoc style analysis, we can quickly gain a lot of useful information which otherwise would require more work. DTrace has its own limitations and it is not a silver bullet (all in all we needed to use lldb), however one can often save a lot of time utilising its power which comes for free for any OS X installation.

Why You Should Use Radamsa

During my time at Secunia I’ve seen a lot of fuzzing results published either publicly or privately (via SVCRP). What struck me at the time was that most of them were made via random bit flip. While this approach is certainly the easiest and fastest to implement and execute, there are other ways to mutate data. One of them is to use Radamsa.

They say that a picture is worth a thousand words, hence we will make a comparison between random bit flip and radamsa with images and our eyes as parsers (more scientific, thus correct, approach would be to collect and compare code coverage data).

Below you can see the results of a random bit flip approach on this seed file (1-to-256 changes of 1/2/4 byte(s) size):

Bit flip

The images are broken in a chaotic fashion. Additionally they all seem to be quite similar.

Now, using radamsa we can get somewhat different set of mutations:

Radamsa

This time images are less broken and mutations seem to be less chaotic. Also we can observe more structural variations (e.g. re-ordered chunks).

We can clearly spot the differences between these two approaches and so can parsers, hence next time when designing your fuzzing operation you should think about incorporating radamsa as one of your mutation engines.

For grids crafting I’ve used ImageMagick’s montage tool (hence we basically tested how ImageMagick’s parser sees things).