As an open source auditor at Openlogic I have had my fair share of challenges when conducting a source code audit. Tackling your first few audits can seem cumbersome and intimidating; from identifying different open source package versions to being given incorrect information from developers. To help with these issues consider the following.
Three things you should always consider:
Confirm package and license: Sometimes different package versions have different licenses, and just knowing that a certain package is there isn't enough. The way I tackle this issue is by visiting the project website, and if necessary, I download and compare my source code to that of the open source package. This tactic is also helpful when you come across potential bundled dependencies that you are calling into question.
Review all matches, even if they appear to be false positives: This step is crucial. This is especially so when you are working with an unfamiliar code base because you have no way of knowing what open source software was intentionally or unintentionally added by developers. When scanning source code from time to time the scanner will find false positives; a false positive does not necessarily mean the scanner was wrong. Because open source projects use code from other projects fairly often, this can lead to a scanner finding several matches where there should only be one correct match.
For example, you are working on an audit and you see a file that appears to belong to two different projects, one licensed under the MIT License, the other under the GPL License. You do not know which license it should be, and you cannot contact the developer for clarification. Now what? You might do the following:
Manually review the source code to check for any copyright notices, author names, or other hints that will shed light on the origin.
If that doesn't help review the files it matched to on the open source side to help you find the answer. Check for other hints such as matching comments or URLs that lead to more information and ideally the correct answer.
Document your findings: This is especially important if your findings require manual investigation.
In time, one of your resolutions will be called into question and you will NOT remember why you made that decision. You probably won't even remember what research you did, even though it was quite memorable seeming at the time! If there was one thing I could tell new auditors this would be it. Being organized is key; organization will speed the process up, increase efficiency, and will lower your chance of mistakes, which in return lowers the chance of compliance failure. I find that keeping all my information (notes, resolutions, a list of identified packages and licenses) centrally located in OLEX has helped in reducing my errors. In the future you may find yourself conducting an audit on projects that contain 2,000 to 500,000 files. Do you really expect to remember how you identified every file? Having all the correct notes will either remind you why you made that decision, or provide you with the road map to find that answer again.
Three things you should never do:
Assume the information you have been given is correct. When open source is being downloaded from all over the Internet you never know what you're getting is from a credible source. It has happened more than once when I was looking at a projects website and it listed a different version of a license than what was in the projects code.
Skip files that have no matches. At a minimum "eyeball" file names and path names for common open source projects before assuming the code is in-house. While auditing if I come across a directory with no matches and it seems to be in-house code I might still spot check a few files to see if there is anything that might make me suspicious of open source. Open source is produced at such a fast rate that nobody can keep up with tracking it yet we still have to make sure we are complying.
Assume that because a directory is mostly one package that all files in the directory belong to that package. As stated earlier, many different open source projects use code from already existing open source projects, sometimes under different or conflicting licenses. Sometimes it might be a few lines of code in a file that are different or even a comment in the header. OLEX is one option to help identify when these issues occur.
This is what I have done when conducting an audit. What is your method?
Nathan Knowles | Open Source Auditor | OpenLogic, Inc. Follow @NKnowlesGIS
Allowed tags: <a> link, <b> bold, <i> italics
If you read a post on The Enterprise OSS Blog, please leave a comment. Let us know what you think, even if it's just a few words. Comments do not require approval, but they are moderated.OpenLogic reserves the right to remove any comments it deems inappropriate.