Subscribe by Email

Your email:

Connect With Us!

Current Articles | RSS Feed RSS Feed

Open Source Code Scanning with “Noise Reduction” & Multiple Matching Techniques

  
  
  

Commercial source code scanning tools have become quite the hot topic for CIO’s, software development managers, in-house counsel, and enterprise architecture teams over the last eight to ten years.   The emergence of these new technologies obviously has direct correlation to the maturity of open source software, which is now just as common as commercially-licensed software in medium to large enterprise data centers.  Additionally, the distribution of open source into the consumer market is undeniable making source code scanning a critical risk mitigation measure for all companies that are buying or selling modern technology.  Today’s article will briefly explain “noise reduction” and the process of using multiple matching techniques in a source code scanning tool.

3 and a Half Reasons You Really Need to Scan for Open Source Software

  
  
  

At a basic level, OSS scanners, such as OpenLogic's OSS Deep Discovery, analyze software development projects looking for components that come from OSS projects. They tie their results to in-depth information about the open source projects, licensing information and even project support. If you're a developer or a project manager here are some reasons you might want to run one on your project. 

SaaS CRM and SaaS Open Source Management & Scanning: Any Difference?

  
  
  

Using a software-as-a-service (SaaS) open source code scanning and policy management tools is not that different than maintaining sensitive corporate customer data in a SaaS CRM solution.  Yet, the latter sees mass adoption while the former still raises security questions and concerns relevant to both.

Open Source Software Compliance: Developing a Risk Matrix

  
  
  

In my last article, Open Source Software Compliance: How Well Are You Rating Risk?, we took a look at the key factors in determining risk associated with the use of Open Source Software (OSS). In this article I will be discussing how you can use those factors to develop a risk matrix to assist you in assessing your overall risk.

OSS Provisioning for Origin, Safety, and Maturity of a Community

  
  
  

A critical consideration of a corporate open source software provisioning strategy revolves around the maturity of the community and longevity of that community continuing to develop their project.

Open Source Software Compliance: How Well Are You Rating Risk?

  
  
  

Many organizations have begun to adopt a “risk rating” as part of their open source software compliance and usage discussion.  Some of the information gathering requirements to assess risk will be relatively easy to meet, while others require much more effort.  There are many factors to consider when assessing risk and as you decide which factors are important to your organization you will need to examine the size of the time investment needed to research and obtain the information associated with each factor.

Source Code Scanning for OSS Dependencies and Why

  
  
  

Open source application audits using source code scanning tools are a critical part of a corporate open source software policy management and governance process; there literally is no way around it these days.  Without the use of a scanning tool, organizations may rely on homegrown tools, manual inspection and inventory of source code repositories, and developer interviews to implement the governance process.  In our experience, even with full disclosure of open source usage from very honest and open development teams, things slip through the cracks.  And, lets face it, manual inspection of source code is painfully slow.  Homegrown tools might be a realistic approach for larger companies, but they require the allocation of internal resources, not only to use the tools but also to also maintain and update them regularly.

Most open source auditing engagements are completed in the context of scanning a code base of a product line to confirm that a company has appropriately separated their intellectual property from the third party components.  When third party components are used and distributed all licenses for these components need to be identified and there needs to be confirmation that appropriate license compliance steps have been taken.  OpenLogic’s Application Audit and Certification of Compliance services are one solution to consider when outsourcing to a team of experts as these are a full report of all materials, licenses, and a re-verification of compliance steps being completed.

Dependency Scanning Use Case

Depending on the industry and level of maturity of the open source policy management process, a more granular level of scanning may be needed.   Open source packages often bundle other open source software within the larger or parent project.  Some companies want to know not just which open source projects are included in their code, but also identify and then associate the sub-components or dependencies to a parent project.  Open source communities come in all shapes and sizes with varying degrees of attention to the issue of documenting dependencies.  In fact not all open source communities that build and maintain projects accurately disclose and update the dependent libraries that the project uses.  There may have been significant changes from version to version resulting in an old and previously accurate list of dependencies being partially incorrect in the newest versions. Consequently, what was once a pre-approved version of an open source project to use in a distributed code base, could easily be a policy violation and potential license violation in that next version.

If you are familiar with OSS development and license types a single file can make a very big difference.  For example, in one of our scans the OpenLogic audit and IP analysis team actually found a license conflict between source code components in an open source project.  We contacted the community to inform them of the conflict as they were not even aware this conflict existed.  The community acknowledged someone had in fact contributed code that created this conflict and the community did the right thing for their end users by removing the conflicting code and replacing it.

If you scan and analyze the open source software project code directly, you can then determine all the dependencies that are used by the specific version.  For example, if an organization's states that the most recent version of Zlib must be used, then this organization would complete a scan to find out if anything has changed from version to version.  As a result, the organization can then confidently make the statement to customers, investors, acquiring companies, etc. “Yes we ship the Zlib library with our product, we always ship the most recent version of Zlib, and we can tell you exactly what Zlib is using in the newest version.  Would you like to see it?”  Then obviously the company would introduce the most recent Zlib Bill of Materials and Licenses to the audience.

The OSS Deep Discovery scanning tool has a customizable setting for this exact situation thus reducing the number of false positives found in the initial results.  In other words, by adjusting the settings accordingly, the scanner will identify all the components inside of Zlib instead of simply reporting that you have matches to Zlib.

The real world example for this level of diligence goes back to having and managing an actionable open source policy.  Open source review boards that have monthly, bi -monthly, weekly, or even impromptu daily meetings about what can and cannot be used and under what conditions need the ability to quickly identify and document these occurrences, make decisions, implement critical policy rule changes and communicate all of this easily to the organization.  One new or changed file can make a big difference in protecting millions of dollars of development and intellectual property.














Why You Should be Using SPDX for Open Source License Compliance

  
  
  

The Software Package Data Exchange (SPDX) standard is getting some love lately and this is good news for open source license compliance. Which is, in turn, good for open source in general. If you are involved in software license compliance activities you need to include SPDX in your plans for the future. It will allow you to manage the risks of software licensing in a more efficient and predictable way than ever before.

SPDX defines a standard way to represent the contents and licensing of software packages. This standard representation provides a shared vocabulary for tools involved in managing license compliance. The SPDX standard is being developed under the auspices of The Linux Foundation as a way to ease complying with the licenses of open source software. The model provided by SPDX is fully compatible with proprietary software licensing also. This means that SPDX provides a uniform way to represent the licensing of any software package. Being able to treat both open source and commercial software the same way allows license compliance processes and tools to be simplified and streamlined.

Reuse


A key value to SPDX is the reuse it can facilitate. Once a package has been analyzed, an SPDX file can be saved containing that information. The next time that package is encountered, rather than redoing the analysis, the previously saved SPDX file can be used. The shared vocabulary provided by SPDX means that other tools, organizations and people will be able to understand the information. This sharing could be purely internal if your organization maintains a library of SPDX files for packages it has seen in the past. Or the sharing could even be across multiple organizations if, for example, the supplier of a package could provide SPDX files to purchasers so that they know how to comply with the licensing of that component. You could even get SPDX data for open source packages from an independent third party. We recently published SPDX files for six popular open source packages. That data is free for anyone to use. Feel free to download and use them to streamline your license compliance process.

Another level of reuse supported by SPDX is the license list. This list is a curated set of common open source licenses. The list exists primarily so that an SPDX consumer can be sure they know what it means when an SPDX file states that a package is licensed under, for example, the GPL version 2. Licenses on the list are given a unique, permanent short name and a permanent URI. These identifiers provide a way to communicate about which licenses are used by a package in an unambiguous way. The license list also indicates when it exists in other license lists. This can be helpful working with legacy data, or communities that maintain their own license lists. Using the license list as your primary repository of license info is often a simple, highly effective, way to utilize the value of the SPDX standard.

Automation


The automation potential of SPDX is the aspect I find most compelling. I am a software developer so that is not particularly surprising. The superb machine processability of SPDX data opens the door to huge improvements in license compliance processes. Imagine being able to amalgamate various existing SPDX files for libraries you use, spot check them for correctness, then run a scanner on your code and merge that information into the SPDX data and then feed that into a tool that gave you a simple checklist of things to do before shipping your software. Now imagine being able to do that with minimal manual effort. That is the promise of SPDX. All of the tools you use speaking the same language so that you can easily integrate the best tools available, whether they are open source, commercial or custom built. With SPDX 1.0 we have the technology we need to facilitate that dream. Already most vendors in the license compliance space support SPDX. The SPDX working group also provides a great set of open source tools for working with SPDX data. This means that we can start automating and streamlining the compliance process today.

Future proofing


Vendor neutrality is another important feature of SPDX. This is important regardless of whether you use bespoke tools, the tools of a single vendor, mix and match tools created by various vendors or don't use any tools at all. The shared vocabulary provided by SPDX means that you are never locked in to a particular tool. If you find a new tool to improve your compliance efforts you can take all the data you already have and import it, or take it's output and use that data with your existing tools. Even if you currently have a completely manual process SPDX still has potential benefits. Using the open source tools provided by the SPDX working group today means that you can easily move to a more automated process in the future with minimal effort. The freedom provided by SPDX is hugely valuable.

As you can see SPDX is a giant leap forward for license compliance. It provides capabilities for improving the quality and reducing the difficultly of license compliance efforts. These benefits come from the ability to reuse previously completed work, automating the process more aggressively and avoiding vendor lock-in. At OpenLogic we are strong supporters of the SPDX effort, both by contributing significantly to its development and by supporting SPDX data directly in our scanning and compliance tools and services. Improving open source license compliance is better for everyone and SPDX provides a real way to achieve that goal today.



Subscribe to The Enterprise Open Source Blog via email




This work is licensed under a Creative Commons Attribution 3.0 Unported License
.

Open Source Software Management: A Recap of the Top Articles

  
  
  

 

Open Source Management: Dealing with New OSS Releases


The first quarter of this year has be a busy time in open source management. JBoss has had two releases in the 7.1 series, the Apache web server has had two releases in the 2.4 series and Ruby on Rails has had two releases in the 3.2 series just to name a few. This may sound like a flurry of new releases, but is really par for the course. In the open source world releases happen all the time. Most open source projects take the release early, release often motto to heart. And for good reason too, it results in better software.

Read the full article

 

Apache HTTP Server: New Features for Version 2.4


The Apache Foundation released Apache HTTP Server 2.2.0 at the end of 2005. Now 7 years later there is a new major release of Apache HTTP Server. Apache HTTP Server currently has  65% market share according to Netcraft. There has always been two competitors in the web space – Apache and IIS – but in late 2007 Nginx was born and has been grabbing more and more market share everyday. Looking at the release notes for Apache 2.4 you can see that this release has a few features that match Nginx’s feature set. Apache HTTP 2.4 has included something for everyone: performance increases; lower memory usage; new modules; program enhancements and new features for old modules.

Read the full article

 

Cloud Technology, Big Data & Hadoop


Big Data seems to be the word of the year.  Everywhere you look there is Big Data staring you in the face: Blogs dedicated to discussing Big DataLinkedIn groups with over 10,000 members focusing on Big Data topics.  Just under 15,000 times each month, the exact phrase Big Data, is entered into the search engines of people across the world.

Read the full article

 

Preparing for Your First Cloud App


There’s a lot of confusion out there around the so-called “cloud app”.  What is it, just another term for “SaaS”?  Or does it refer to running your own application in a public cloud?  As with many phrases that include the ubiquitous word “cloud”, it can mean just about anything.  In the context of this post, “cloud app” refers to an application you’re running in a public cloud, such as Amazon AWS or Rackspace Cloud.

Read the full article

 

5 Ways an Open Source Governance Process Can Improve Your Organization


Is one of your resolutions for the new year to create an enterprise open source governance process for your organization, or review and update your existing governance process? If your organization doesn’t already have an open source governance process, this should definitely be on your list of goals for 2012. Likewise if you have a governance process that’s outdated, incomplete, or inconsistently implemented throughout the organization.

Read the full article






Subscribe to The Enterprise Open Source Blog by Email

This work is licensed under a Creative Commons Attribution 3.0 Unported License
.

Creating an Open Source Compliance Checklist

  
  
  

In a recent blog article Using Categorization to Simplify Open Source License Compliance I talked about simplifying open source compliance through license “categorization” where I listed the common categories used in many open source licenses. In this article I’m going to talk about creating an open source compliance checklist based on those categorizations.

In OpenLogic Exchange Enterprise Edition (OLEX), we have analyzed several hundred open source licenses and created a list of high-level obligations for each license. For example, in OLEX the Apache License 2.0 list of obligations looks like this:

• Distribute copy of license
• Give notice of or fulfill other requirements related to modified files
• Obligation to include notice text or files
• Obligation to include copyright or trademark notice
• Obligation to indemnify contributors
• Obligation to apply license to original or derivative works
• Restrictions regarding use of trademark
• Termination of patent license upon filing of patent litigation

If you don’t have the luxury of an OLEX Enterprise Edition subscription, you will want to take a similar approach and summarize the list of obligations in easily digestible “chunks.” This will make it easier for the team to take steps to comply with the various licenses used.

Next you want to examine under what conditions you need to meet certain obligations. For example, the requirement to supply documentation about modifications to open source are only triggered when you modify the code. By creating a set of triggers tied to your list of obligations you can quickly determine if you need to take steps or not.

You can then use these sets of triggers to ask your development team how they are using the open source in your product. Once you have their answers in hand, you can begin to eliminate the obligations you don’t need to meet which in turn helps keep the list manageable.

Some common triggers include: distribution, modification of source and creation of derivative works via linking. So for example, you can confirm that open source found in your source files is indeed distributed with your product or not. It is surprising how many files that contain open source in your source tree actually don’t get distributed with the final product. Or, if your code links to an open source library, how so? Dynamically or statically? The answer to these questions allows you to pick the relevant obligations.

Once you understand your list of high-level obligations by license and when those obligations are triggered you can begin to build your checklist. When we build compliance checklists for our customers we build the list by obligation type and license, then also list the responsible packages. For example if we identify a modification requirement in a license, we list the products that have been modified (down to the file level), so that we can later verify modification requirements have been met.

Finally, once you have your checklist in place you can work with your various functional teams to verify that all physical steps have been take to comply. Has your documentation group updated the product documentation to include the appropriate trademark, copyright and attribution requirements? CHECK. Has your development team met all modification documentation requirements and made sure they didn’t remove any copyright statements? CHECK. Has your legal team reviewed your product EULA agreement to make sure it doesn’t conflict with any of the open source licenses and are they fully aware of any restriction on use and have agreed that your organization can meet these restrictions? CHECK. If you need to provide source code via physical medium or hosted download site, is your organization prepared to fulfill requests? CHECK.

The process of license compliance is not trivial but by taking advantage of license categorization, high-level lists of obligations and triggers, you can make the process manageable.

I invite you to comment and provide insights on how your organization complies with open source licenses. If you are interested in learning more about OLEX or services provided by OpenLogic can assist you in your compliance, please feel free to contact us at sales@openlogic.com.






Subscribe to The Enterprise Open Source Blog by Email

This work is licensed under a Creative Commons Attribution 3.0 Unported License
.




































All Posts

Enterprise OSS Blog Policy

If you read a post on The Enterprise OSS Blog, please leave a comment. Let us know what you think, even if it's just a few words. Comments do not require approval, but they are moderated.OpenLogic reserves the right to remove any comments it deems inappropriate.

 

Contact Us

Browse by Tag