Open Source Scanning: A Technical Perspective on Which Files to Scan

Posted by Dave McLoughlin on January 23rd, 2012 in Scanning & Provisioning

When preparing to scan your application development projects for open source software, one simple approach is to point your  scanner at the root directory of your development system. However, that is probably not the most efficient approach because the results may include many open source components that are not actually part of your application. Or worse, the scanner may miss components that are not present in the build environment. There are many reasons to be careful and selective about what you scan and why.  Here’s a short list of considerations to keep in mind when preparing to scan and determine the open source used in your application.

Binaries vs. Source Code

When do you have to supply binaries in your scanning effort? Open source scanning tools like OSS Deep Discovery can scan and find snippet-level matches within source files. If you are using open source libraries you may think that simply providing source code is sufficient, but here are a few rules to consider when deciding whether to include binaries.

1) If you only have compiled versions of some libraries you may have no option — you have to include the binaries. But often, compiled versions of open source libraries can be easily obfuscated and may not be recognized by the scanner. In cases like this, if you know of binary-only libraries in your code, it is in your best interests to try to find and download the original source. You will have a much better chance of getting accurate results, and the scanner may even find some additional open source components you didn’t know were used in the original open source library.

2) If you have the source code to all binaries you provide in your final application, then there is little reason to include binaries. They may only complicate the results and make reconciling the scan more difficult.

3) There are some circumstances you may want to include binaries when you also have all the source available. On Linux systems, for example, running “ldd” can provide good information on how your code is linked to standard open source libraries in the operating system. This linking can provide additional information about license obligations that are triggered on the combination or linking of programs.

Build vs. Runtime Components

While it is much easier to just provide everything when running a scan, you may want to think about run vs. build-time components. Here are a few examples of where this can be important.

1) Many times build components are licensed under the GNU General Public License (GPL).  Scans that include build-time components (that do not get distributed with your application) may turn up matches that are hard to explain to your legal department or compliance group. For example, if you are careful not to use GPL code for policy reasons on a commercial application, but a scanner shows several matches to GPL-licensed code, you then need to help your less technical folks understand why these components are not distributed, why it’s not a compliance issue, or why GPL is in your code when you said it wasn’t.

2) There are times that simply including everything is a good thing. It helps confirm what open source components get distributed with your application so you make sure you don’t accidentally ship something that triggers some unintended license obligations.

Additional Components

A common mistake I see people make when preparing to scan their projects for open source is to accidentally leave components that include open source out of the scan. Here are a few considerations to keep in mind when preparing to scan your code.

1) Does your build process download and use components that are not in your source code repository? If additional code is downloaded at build time, make sure that those additional components are included when you run the scan. This is particularly important if you use tools like Maven.

2) Does you product rely on additional components that are not part of the build environment you are scanning? This may seem like an obvious question, but sometime we get so close to our work we miss the obvious. When preparing files to scan double check that you have included all components you ship.

Performing an open source scan and audit of your code is an important component of the modern software build process. Keeping these relatively easy set of considerations in mind can help make the process more efficient and easier to manage.

Subscribe to The Enterprise Open Source Blog via email


This work is licensed under a Creative Commons Attribution 3.0 Unported License
Creative Commons License.

Leave a Comment



Find us on LinkedIn Add us on Facebook! Add us on Google Plus! Chat with Us

Archives

Categories

About Us

OpenLogic helps enterprises use open source software by providing open source support, scanning, governance, and cloud solutions. For more on OpenLogic, go to www.openlogic.com.