The theft of code by Microsoft Co-Pilot will be decided in court.
Once upon a time, three witches sat around a cauldron. Their names were Embrace, Extend, and Extinguish. They were the three merry witches of Microsoft and under the silvery light of a crescent moon, they were casting a spell to rid the world of their mortal foe – open-source software.
I’m an old hand of the software world and I remember when Microsoft was The Enemy, with a capital “E”. They had openly described open source as “a cancer” in 2001. Extinguish, of all the Microsoft witches, was the one they wanted to call upon. Linux must die. Firefox must burn.
But then, something changed. By 2016, Steve Balmer had learned to love Linux. Extinguish slipped back into the shadows behind her stygian sisters and Embrace stepped forward, warm arms extended. The old hands had seen this trick before, of course. We knew what to expect. “Look out,” we said, “Embrace comes before Extend and Extend comes before Extinguish”. But nobody listens to old hands. There’s a reason there are so few of us.
Microsoft embraced Linux. It contributed code. It began to play the game. It put Linux at the heart of its cloud hosting platform, Azure. Embrace worked her magic so well this time around that we barely noticed that sometimes the hands around us belonged to Extend. And those hands tend to close around your throat.
When Microsoft bought Github, we old hands raised warning flags yet again. Microsoft, the old Enemy, could not be trusted. Github was too valuable, too important, to allow it to fall into their hands. But, nevertheless, it happened. We had been embraced. We would be extended. We feared being extinguished.
But, instead of extinguishing us, Microsoft had a new plan. A new witch, creeping from the dirty swamp waters of corporate strategy. This witch, all grasping hands and leash-leather skin had a name. She was called Enslave.
Thus enters Co-Pilot, Microsoft’s AI tool that can generate code for you. You ask, it writes. It’s a code genie, a wish-granting machine that creates software out of thin air. Except, it doesn’t generate anything. It doesn’t write anything. Co-pilot copies code from existing projects… existing open-source projects.
Back in the early days of the web, there was a technique called “content spinning” – it involved taking (stealing) someone else’s content, changing some words, and then passing it off as your own. All you needed was a digital thesaurus and a completely absent moral compass.
Ostensibly, Co-Pilot is no different. It’s a smarter spinner, but it’s just a spinner. It’s gobbled up as much code as it can handle, billions of lines from projects stored on Github, parsing comments, chewing up variable names, and turning the hard work of a vast number of open-source developers into copy-and-paste patterns that Microsoft can package and resell. For-profit. For itself.
They say that the system generates code but there are numerous examples on the web now of code that has been taken verbatim from open-source projects. The licenses under which open-source code is released normally require attribution of any code that is reused back to the original author. Co-pilot doesn’t do this. Arguably, feeding billions of lines of open-source code into a Frankensteinian sausage machine is also not what open-source software authors had in mind when they pushed their code to Github.
I have code on Github. Had I been asked for my consent for it to be used in this way I would have said no. (And not just because I’m an old hand and Microsoft is The Enemy). Read my code? Fine. Reuse my code? Absolutely. Learn from it, copy and paste it? Crack on. But pick it up and sell it? No.
I’m an old-school free software advocate. I like “Free as in Freedom” as well as “Free as in Beer”. Of course, open-source code gets half-inched and put inside proprietary software. It’s inevitable. But Microsoft is committing this larceny on a grand scale, and they are breaking open source licenses (at least in spirit) to do it.
Developers with projects larger than mine aren’t taking things lying down though. Microsoft is being taken to court and the future of Co-pilot, and AI code generation, will be put to the test.
Co-pilot doesn’t do quite as good a job as it needs to in hiding it’s sources. Like a cub reporter cracking under interrogation from their hardened editor, it gives up the goods too easily, spitting back lumps of good that are straight copy and paste from open source projects.
Not cool, Microsoft.
This court case will take a long time to settle. Courts aren’t good at dealing with technology issues for one thing and this will be a landmark case in terms of determining what open code and open data can be used for when training AI.
Like the look of all those fun, text-to-image AI machines? Just remember they took a lot of images created by real, living artists to “train” the AI how to make a picture. Want to use one of those AI copywriting machines? Spare a thought for every writer who has had their work, probably without their knowledge, ground up and fed into the machine.
I’m not Luddite. I’m not here to smash the looms. I’m fascinated by AI and believe it has huge potential. I’m just not keen on people taking stuff that doesn’t belong to them. As I said, I’m an old hand.
I do hope there will be a few old hands on the jury of this case as well…