A user had left a comment on Slashdot that suggested Birthmarking as a possible solution for detecting such code theft. A paper to be presented at the upcoming academic conference Automated Software Engineering describes a new method to detect code theft and could be used to detect GPL violations in particular. While the co-called birthmarking method is demonstrated for Java, it is general enough to work for other languages as well. The API Benchmark observes the interaction between an application and (dynamic) libraries that are part of the runtime system. This captures the observable behavior of the program and cannot be easily foiled using code obfuscation techniques, as shown in the paper (PDF).
Once such a birthmark is captured, it can be searched for in other programs. By capturing the birthmarks from popular open-source frameworks, GPL-violating applications could be identified."Java has a very large standard library that is always dynamically linked, and hence can easily be instrumented as the technique requires. C allows static linking which would make such hooking much more difficult. Additionally Java executes in a very standard environment due to the Virtual Machine, where as other languages may have varying type sizes and other properties that could add significant noise to the birthmark.
Anyhow, the real problem, with identifying open source code pilfered and added to a closed source project is that you generally aren't allowed to reverse engineer the code itself to see what it actually does. So even if you're Very Damn Sure that a piece of commercial software illegally uses open source and sells it as its own closed source, you're not allowed to investigate and come up with evidence. You'll have to file a suit and get a judge to order the code examined, and with only a good hunch to go on, and no way to document a financial loss, and probably not having too deep pockets yourself, that's rather unlikely to go anywhere.
Which is why people support institutions like FSF, which can occasionally fight the battle on behalf of the little guy.
On September 20, the Software Freedom Law Center has filed the first
lawsuit to enforce the General Public License version 2 in the United
States ("GPLv2"). The GPLv2 continues to be the most widely used open
source license: more than 65% of the projects on SourceForge use it.
The plaintiffs, Erik Andersen and Rob Landley, sued Monsoon Multimedia,
Inc. for copyright infringement of the BusyBox software in the Southern
District of New York. The complaint can be found at http://www.softwarefreedom.org/news/2007/sep/20/busybox/complaint.pdf.
The plaintiffs allege that Monsoon Multimedia distributed their program
as part of their firmware, but did not make the source code available.
For more information about this case refer to this link.
Technorati Tags: GPL Violations, Birthmarking, Software, Monsoon Multimedia, Busy Box

