If you want to get serious about finding impactful vulnerabilities through static analysis, it’s time to move beyond simply grep-ing through code bases. In this blog post, I’ll share my personal process for setting up a robust environment for Java static analysis of console applications, web applications, and Android applications. Once you’ve established this test environment, you’ll be able to take advantage of automatic code references, trace usages across a code base, and leverage source-to-sink analysis to find elusive vulnerabilities. 😈

1. Choose an Integrated Development Environment (”IDE”)

To make our static analysis much more effective, it’s imperative to leverage a Java IDE. An IDE is essentially a fancy text editor with the superpower of being able to resolve code references to external references such as third party libraries or the Java SDK. In this post, I’ll use the excellent IntelliJ IDEA IDE from JetBrains, although you can use any other Java IDE as well such as Eclipse.

2. Obtain a Java Package (.jar, .war, or .apk) to Analyze

When reverse-engineering a Java application, you will typically obtain a package of Java classes in the form of a .jar, .war, or .apk file. Let’s take a look at the differences between these file types with this table:

Package TypeDescriptionExample
.jar (”Java Archive”)Packaged Java application or libraryhttps://repo1.maven.org/maven2/com/squareup/okio/okio/3.3.0/okio-3.3.0.jar
.war (”Web Archive”)Packaged Java web applicationhttp://www.opencms.org/en/download/
.apk (”Android Package”)Packaged Android applicationhttps://play.google.com/store/apps/details?id=no.mobitroll.kahoot.android&gl=US

3. Unpack the Java Package and Obtain .jar Files

The next step is to obtain .jar files from our original Java package. This process will vary slightly depending on the type of package you are analyzing.

Getting .jar Files From a.jar File

If you are analyzing a .jar file, you are ready to decompile! That was easy, move on to the next section 🙂

Getting .jar Files From a .war File

A .war file is really just a .zip archive, so we can unpack it using the unzip or jar utility:

❯ jar xvf opencms.war

created: META-INF/
created: WEB-INF/
inflated: WEB-INF/cmsshell.sh
created: WEB-INF/setupdata/
created: WEB-INF/setupdata/database/
created: WEB-INF/setupdata/database/db2/
inflated: WEB-INF/setupdata/database/db2/create_db.sql
inflated: WEB-INF/setupdata/database/db2/drop_db.sql
inflated: WEB-INF/setupdata/database/db2/drop_tables.sql
inflated: WEB-INF/setupdata/database/db2/create_tables.sql
created: WEB-INF/setupdata/database/oracle/
inflated: WEB-INF/setupdata/database/oracle/create_db.sql
inflated: WEB-INF/setupdata/database/oracle/drop_db.sql
inflated: WEB-INF/setupdata/database/oracle/drop_tables.sql
inflated: WEB-INF/setupdata/database/oracle/create_tables.sql
# Truncated for brevity...

A quick find command will show us where all the .jar files are hanging out:

❯ find . -name "*.jar"

# Truncated for brevity...

But a lot of these .jar files look like third party libraries that we might not want to spend time decompiling, at least not initially. We want to find the main logic implemented by the developers of the application we’re looking at to make the best use of our time.

To do this, I always check out the web.xml file, which should be in the WEB-INF/ directory. This file is used by Java application servers to determine which classes to route traffic to. For example, the figure below shows a few interesting classes implementing servlets for the OpenCMS application. From what I’m seeing here, the org.opencms package is looking like a great place to start analyzing.

<!-- Truncated for brevity --->
        The error handling servlet, also serves as trigger for static export requests.

        The main servlet that handles all requests to the OpenCms VFS.
<!-- Truncated for brevity --->

Now, we can use a simple grep to identify .jar files that implement the classes we care about. These matched .jar files seem like great candidates for handling the application’s main logic

❯ grep -ir org.opencms WEB-INF/lib

Binary file WEB-INF/lib/opencms-setup.jar matches
Binary file WEB-INF/lib/org.opencms.locale.it.jar matches
Binary file WEB-INF/lib/org.opencms.locale.de.jar matches
Binary file WEB-INF/lib/org.opencms.locale.da.jar matches
Binary file WEB-INF/lib/org.opencms.locale.es.jar matches
Binary file WEB-INF/lib/org.opencms.locale.cs.jar matches
Binary file WEB-INF/lib/org.opencms.locale.zh.jar matches
Binary file WEB-INF/lib/opencms.jar matches
Binary file WEB-INF/lib/opencms-resources.jar matches
Binary file WEB-INF/lib/org.opencms.locale.ru.jar matches
Binary file WEB-INF/lib/org.opencms.locale.ja.jar matches
Binary file WEB-INF/lib/opencms-modules.jar matches

Next, let’s move those .jar files to a separate directory, since we’ll focus on decompiling them first.

❯ mkdir jars-we-care-about && grep -ir org.opencms WEB-INF/lib | awk '{print $3}' | xargs -I{} cp {} jars-we-care-about

And we’ve extracted the .jar files from our .war file! Let’s see how to do it with an .apk file, and then we’re ready to decompile.

Getting .jar Files From an .apk File

Getting JARs from an .apk file is a bit different because Android applications utilize the Dalvik Executable (”DEX”) format to package Java classes and resources instead of JAR. However, the classes are compiled into the same (mostly, Dalvik bytecode is more optimized) bytecode. So, we can transform a .dex to a .jar file quite easily.

We can use unzip or jar once again to unpack the application and identify the .dex files within an .apk file. The figure below shows us identifying two .dex files for the Kahoot android application.

❯ jar xvf no.mobitroll.kahoot.android.apk

inflated: classes.dex   # Dex file
inflated: classes2.dex  # Dex file
inflated: lib/arm64-v8a/libpruneau.so
inflated: lib/armeabi-v7a/libpruneau.so
inflated: lib/x86/libpruneau.so
inflated: lib/x86_64/libpruneau.so
extracted: assets/audio/TheEnd.mp3
extracted: assets/audio/alt02-answer_010sec.mp3
extracted: assets/audio/alt02-answer_020sec.mp3
extracted: assets/audio/alt02-answer_030sec.mp3
extracted: assets/audio/alt03-answer_010sec.mp3
extracted: assets/audio/alt03-answer_020sec.mp3
extracted: assets/audio/alt03-answer_030sec.mp3
extracted: assets/audio/alt03-answer_060sec.mp3
extracted: assets/audio/alt03-answer_090sec.mp3
extracted: assets/audio/alt03-answer_120sec.mp3
extracted: assets/audio/answer_10sec.mp3
extracted: assets/audio/answer_20sec.mp3
extracted: assets/audio/answer_30sec.mp3
extracted: assets/audio/content_slide_underscore.mp3
extracted: assets/audio/correct_01.mp3

Next, we can use the dex2jar utility to convert the .dex files to .jar files:

❯ d2j-dex2jar classes*
dex2jar classes.dex -> ./classes-dex2jar.jar 
dex2jar classes2.dex -> ./classes2-dex2jar.jar

4. Decompile the .jar Files

With our desired .jar files in hand, it’s time to decompile them! IntelliJ comes with a great decompiler that you can call from the command line. Here’s how to do it on Mac:

❯ mkdir decompiled-jars-we-care-about # Make a new dir for decompiled jars
❯ java -Xmx7066M -cp /Applications/IntelliJ\ IDEA\ CE.app/Contents/plugins/java-decompiler/lib/java-decompiler.jar org.jetbrains.java.decompiler.main.decompiler.ConsoleDecompiler -mpm=3 jars-we-care-about decompiled-jars-we-care-about

INFO:  Decompiling class org/opencms/module/CmsModuleDependency
INFO:  ... done
INFO:  Decompiling class org/opencms/module/CmsModuleImportExportRepository
INFO:  ... done
INFO:  Decompiling class org/opencms/module/CmsModuleUpdater
INFO:  ... done
INFO:  Decompiling class org/opencms/module/CmsResourceImportData
INFO:  ... done
INFO:  Decompiling class org/opencms/module/CmsModuleVersion
INFO:  ... done
INFO:  Decompiling class org/opencms/module/CmsReplaceModuleInfo
INFO:  ... done
INFO:  Decompiling class org/opencms/module/CmsModuleXmlHandler
INFO:  ... done
INFO:  Decompiling class org/opencms/module/CmsModuleManager
INFO:  ... done
INFO:  Decompiling class org/opencms/module/CmsModule
INFO:  ... done
INFO:  Decompiling class org/opencms/module/Messages
INFO:  ... done
INFO:  Decompiling class org/opencms/module/CmsModuleImportData
# Truncated for brevity...

Now, we can unpack the decompiled .jar files to reveal the decompiled Java source code inside!

❯ cd decompiled-jars-we-care-about
❯ find . -name "*.jar" | xargs -I{} jar xvf {}  # Unpack all the decompiled JARs

created: META-INF/
created: org/
created: org/opencms/
created: org/opencms/setup/
created: org/opencms/setup/ui/
created: org/opencms/setup/db/
created: org/opencms/setup/db/update6to7/
created: org/opencms/setup/db/update6to7/oracle/
created: org/opencms/setup/db/update6to7/postgresql/
created: org/opencms/setup/db/update6to7/mysql/
created: org/opencms/setup/db/update7to8/
created: org/opencms/setup/db/update7to8/oracle/
created: org/opencms/setup/db/update7to8/postgresql/
created: org/opencms/setup/db/update7to8/mysql/
created: org/opencms/setup/updater/
created: org/opencms/setup/updater/dialogs/
created: org/opencms/setup/xml/
created: org/opencms/setup/comptest/
inflated: org/opencms/setup/ui/CmsSetupStep02ComponentCheck.html
inflated: org/opencms/setup/ui/CmsSetupErrorDialog.html
inflated: org/opencms/setup/ui/CmsSetupStep03Database.html
inflated: org/opencms/setup/ui/CmsSetupStep01License.html
inflated: org/opencms/setup/ui/CmsSetupStep06ImportReport.html
inflated: org/opencms/setup/ui/CmsSetupStep07ConfigNotes.html
inflated: org/opencms/setup/ui/CmsSetupStep05ServerSettings.html
inflated: org/opencms/setup/ui/CmsSetupStep04Modules.html
inflated: org/opencms/setup/ui/CmsDbSettingsPanel.html
# Truncated for brevity...

We’re left with .java source files!

❯ find . -name "*.java"

# Truncated for brevity...

5. Set Up Your IDE

Now that we have (basically) Java source code, it’s time to load it into our IDE and start finding vulnerabilities.

Open Your Decompiled Java Code

First, fire up IntelliJ and click Open.

Opening a Project in IntelliJ

Now, select the directory of decompiled and expanded .jar files we created.

Selecting JAR Files in IntelliJ

Resolve Dependencies

This is where the true magic happens. We’re going to point IntelliJ to the original .jar files we extracted, (not the decompiled ones) and it is going to allow us to resolve the application’s dependencies to perform in-depth static analysis.

Go to File → Project Structure… and, firstly, select a Java SDK that makes sense for the project you’re analyzing. Be sure to click Apply.

Selecting Java SDK

Next, go the Libraries tab and click - to remove any current dependencies, as these are decompiled .jars and won’t resolve things properly.

Deselecting Decompiled JARs

Click + to add a new project library and choose Java:

Adding New Project Library

Select the folder containing the original, compiled .jar files. Include the third party dependencies, we want to be able to resolve them too!

Resolving Dependencies

Hit Apply and then OK.

Apply and Ok

Additional Unresolved Dependencies

It’s possible that after adding all the original .jars as libraries, you’ll still have some unresolved dependencies within your project. These will show up red within IntelliJ.

Broken Dependencies

This is likely because the application you are analyzing assumes that the system running it will have these dependencies on its class path already. For example, an Android APK should expect the Android SDK to be present, and a web application should expect the javax.servlet package to be present.

Sometimes, you can resolve these dependencies quickly by clicking on the red lightbulb within IntelliJ and then Find JAR on web.

Find JAR on Web

Otherwise, you can navigate back to File → Project Structure… and add a new library from Maven (the Java package manager).

Add Library from Maven

Search for and select the dependency you need to resolve. IntelliJ will download the .jar from Maven and add it to your project.

Completing JAR Download

6. Profit!

I know it took a bit to get here, but by investing time up front, you are now prepared to find vulnerabilities that simply wouldn’t be possible otherwise.

Finding Usages

One of the pain points of manual static analysis is the tedious process of tracing user-controllable input through the application. Well, with our current setup, we have made our lives much easier. We can click on any variable, method, or class and find all of its assignments and usages throughout the application. For example, we might want to understand how the class variable m_settings is being used throughout the application:

Find Usages Usage

IntelliJ’s ability to map out all read and write occurrences of a variable in question greatly speeds up manual static analysis.

Read and Write Occurrences

Data Flow Analysis

IntelliJ goes behind just finding usages, though. It can fully map out the flow of data within the application. To do this, we right click on a variable and go to Analyze → Data Flow to Here… or Data Flow from Here… Below, we are doing this to understand how the req variable of the doGet() method of an HTTP servlet is being used.

Data Flow Usage

We can then easily view all the places the req variable is passed into other functions within the rest of the application in one single place. This is extremely useful during static analysis.

Viewing Data Flow

Sink-to-Source Analysis

With these tools under our belt, we have to ability to find truly impactful vulnerabilities. We can leverage source-to-sink analysis, (also called taint analysis) where we trace the flow of data from dangerous functions back to parameters we control. For a more in-depth definition of source-to-sink analysis, check out one of my previous blog posts: https://www.mandiant.com/resources/blog/route-sixty-sink-launch.

For example, let’s say we want to look for unsafe deserialization vulnerabilities (they just so happen to be one of my favorite vulnerability classes). We know that a call to java.io.ObjectInputStream::ReadObject() on attacker-controlled data can result in SSRF, file access, or even remote code execution. Let’s see how to search for Java object deserialization within our application:

First, enter ⌘ + O and search for the java.io.ObjectInputStream class:

Searching For ObjectInputStream

Identify the readObject() method and view its data flow throughout the application:

Read Object Data Flow

We immediately have a detailed call graph leading to our sink. If we can trace it to input that we control as an attacker, (an HTTP parameter, header, etc.) we’ve found a vulnerability!

Data Flow Results

Thanks for reading! Keep an eye out for more security content coming your way soon! Stay safe out there! 🔒💻👨‍💻