Register | Login
Intellectual Property Today
RFC Express - Document Management System

Measuring Whitespace Patterns in Computer Source Code as an Indication of Copying



By Ilana Shay of and Nik Baer and Bob Zeidman of Zeidman Consulting

There are several different methods of comparing source code from different programs to find copying1. Perhaps the most common method is comparing source code statements, comments, strings, identifiers, and instruction sequences. However, there are anecdotes about the use of whitespace patterns in code. These virtually invisible patterns of spaces, tabs, and newlines have been used in litigation to imply copying, but no formal study has been performed that shows that these patterns can actually identify copied code. This paper presents a detailed study of whitespace patterns and the uniqueness of these patterns in different programs. We decided to investigate whitespace file patterns and determine whether comparing whitespace patterns in different files is a reliable method to measure code similarity and thus detect copying.

When writing code, the programmer is focused on the visual elements: statements, comments, variable statements, comments, variable names, and strings. During the writing process the programmer also uses non- printing characters to separate the programs visual elements. The non-printing characters can be spaces, tabs, or newlines. The sequence of these non-printing characters is the whitespace pattern.

We will score file pairs based upon a percentage of similarity of their whitespace pattern...

To view the complete article you must be logged in
Login Now

Not A Member Yet? Sign Up For A Free 10 Day Trial Account!


  © Copyright 2012 Intellectual Property Today
Download Adobe Reader for free