Software application security continues to play an important role when developing applications. Every self-respecting company includes source code analysis, dependency scanning, and other security measures in their CI/CD pipelines. However, you need to do more to also protect yourself against malware, digital forensics, and software infringement. Digital fingerprinting helps to tackle advanced security topics.
Context and problem area
To capture every potential security threat in your software applications, it’s not always enough to use conventional tools that you already use. Source code can be scanned to identify security misconfigurations, security bugs as well as hidden secrets. Dependency scanners help to detect vulnerabilities and forbidden licenses as well as other security flaws. Furthermore, dynamic application security analysis scans a running application for potential security issues.
This is not always sufficient. What is missing here is an in-depth understanding of the true behavior of an application (component). Threat actors such as malware, crypto miners, and backdoors tend to hide their true characteristics. In the case of forensic investigation, there is also a need to do a deep dive into the actual software packages.
Digital fingerprinting helps to capture behavior/characteristics that have been lost during the compilation or obfuscation of software source code. Think of the original authors, source code methods, or used licenses. Often, systems tend to obfuscate these aspects to hide certain behaviors which enables the bad actor to hide malware, backdoors, etc.
Another use case is reverse engineering. You can use digital fingerprinting to identify the compilers that are used to produce the binary artifact, conduct author analysis, and many things more.
Where to start
Digital fingerprinting is a complex topic, so it’s good to have some good sources that help you get started.
Binsign
The authors of the paper that talks about Binsign offer a great view of their framework. It includes capturing fingerprints based on syntactic, semantic, and structural information of a function. By identifying, collecting, and comparing identical fingerprints, the hidden behavior of the selected binaries can be revealed. Especially the matching feature is useful since there is already a huge database of function fingerprints. Before this step is achieved, the framework should first generate fingerprints based on the input.
In addition to the features above, the paper explains the comparison of Binsign to other existing tools available on the market. It also details the algorithms that are being used as well as components that compare function features before they are evaluated.
Another great feature is the detection of source and binary clone detection. Binsign includes multiple levels of clone type. Not only exact matches but also clones based on similar structures or semantics.
According to the authors, Binsign performs very fast and produces very reliable results with a great level of detail in its findings.
Diving deeper
Since the first resource is a relatively short paper that highlights the main components of Binsign, you might be interested in more advanced topics. The excellent book called “Binary Code Fingerprinting for Cybersecurity” written by a large group of authors offers this.
In short, the book covers the following topics:
- A great introduction to the subject with plenty of examples which help to clarify this area of interest.
- A comparison and review of existing digital fingerprinting frameworks
- Specific areas of fingerprinting frameworks that are important to evaluate the results
Besides these core topics, it also handles several very deeply technical methods to analyze fingerprints. In some cases, these are tech-agnostic, in other cases, these are very tech-specific. This makes it an interesting starting point to reflect on the tools and techniques you might already use in your day-to-day tasks.
Open source solutions
With all of the theories in mind, it’s nice to jump to more practical approaches that enable you to take action immediately. The whitepaper “A Five-Step Compliance Process for FOSS Identification and Review” acts as a starting point to get the right context. From here, you can download a thorough study. It describes a system called FOSSIL that helps you to identify FOSS functions/packages in malware binaries. This study specializes in function identification where no source code is available and it also helps to detect obfuscating techniques. One of the strong points is that the above-mentioned system uses three data sets with real-world projects.
Not limited to this study, but also useful for other tools and techniques are the following challenges:
- The system should reveal immediate insights when detecting FOSS packages so reverse engineers meet a high level of usability. They can immediately start their investigations, thus making the system very efficient. Extracting, indexing, and matching features from binaries promptly helps to process the list of findings in a fast manner.
- Furthermore, the system should also be robust: small changes to compiler settings, a small structural change or an adapted formatting of the source code should not falsify the findings if the contents in itself are identical. Simply said: the distortion of features can be obfuscated, but this should not have an effect on the outcome.
- Scalability and stability are other challenges: the system should perform very well since it needs to deal with a large dataset. Often, there are millions of functions to be scanned, of many different packages based on numerous versions per package. In addition to that, think of not having to re-index the database of packages in case of an upgrade to the system itself.
Summarizing
Besides the challenges, the study also reveals a lot of information about the threat model that is used and an overview of the system itself as well as a deeper dive into the details. It presents the results of an actual case study which is followed by the current limitations and suggests further work to complement what was not covered yet.
All in all a value to learn more about digital fingerprinting within the scope of open source software.
Conclusion
In this article, we explored various topics that handle specific aspects of digital fingerprinting. This advanced security topic is used to detect malware and conduct digital forensics as well as invalid licenses. It’s rather complex to understand but luckily there are plenty of (open source) systems such as FOSSIL to guide you.