Thursday, May 24, 2012

Copyright capitalism

Imagine a world where you weren't allowed to use powerful computers to use weather patterns and astronomical data – it's just nonsensical yet commercial restrictions from publishers of scientific papers is not only to be restricting access to the scientific community, but also hindering the work of researchers.

The scale of new information in modern science is staggering: more than 1.5m scholarly articles are published every year and the volume of data doubles every three years. Two new pieces of research are uploaded to UK PubMed Central every single minute of the day, how can researchers possibly sift through, understand and make new discoveries from the torrent of data in their field? The simple answer is that they can't. But computers can. No individual can keep up with such a volume, and scientists need computers to help them digest and make sense of the information. A technique, called text mining uses computers to look for unseen patterns and associations across the millions of words in the articles. Computers have an almost limitless capacity to “read” and can be programmed to analyse enormous datasets, identifying links, trends or patterns in data which reveal new scientific discoveries, create new products or services, develop new medicines more quickly   Unfortunately, in most cases, text mining is forbidden. Bergman, Murray-Rust, Piwowar and countless other academics are prevented from using the most modern research techniques because the big publishing companies such as Macmillan, Wiley and Elsevier, which control the distribution of most of the world's academic literature, by default do not allow text mining of the content that sits behind their expensive paywalls. Current copyright law limits the possibilities of text mining for unlocking science

Professor Peter Murray-Rust was looking for new ways to make better drugs. Dr Heather Piwowar wanted to track how scientific papers were cited and shared by researchers around the world. Dr Casey Bergman wanted to create a way for busy doctors and scientists to quickly navigate the latest research in genetics, to help them treat patients and further their research. These are discoveries that a person scouring through papers one by one may never notice. Bergman, an evolutionary biologist at the University of Manchester, used text mining to create a tool to help scientists make sense of the ever-growing research literature on genetics. Though genetic sequences of living organisms are publicly available, discussions of what the sequences do and how they interact with each other sits within the text of scientific papers that are mostly behind paywalls. Working with Max Haeussler, of the University of California, Santa Cruz, Bergman came up with Text2genome, which identifies strings of text in thousands of papers that look like the letters of a DNA sequence – a gene, say – and links together all papers that mention or discuss that sequence. Text2genome could allow a clinician or researcher who may not be an expert on a particular gene to access the relevant literature quickly and easily. Haeussler's attempts to scale up Text2genome, however, have hit a wall, and his blog is a litany of the problems in trying to gain permissions from the scores of publishers to download and add papers to the project. "If we don't have access to the papers to do this text mining, we can't make those connections," says Bergman.

Murray-Rust, a chemist at the University of Cambridge, has used text mining to look for ways to make chemical compounds, such as pharmaceuticals, more efficiently. "If you have a compound you don't know how to make and it's similar to one you do know how to make, then the machine would be able to suggest a number of methods which would allow you to do it." But, although his university subscribes to the journals he needs to do this work, he is forbidden from using the content in what he calls "a modern manner using machines". A member of his research group accidentally tripped the alarms of a publisher's website when he downloaded several dozen papers at once from journals to which the university had already paid subscription fees. The publisher saw it as an attempt to illegally download content and immediately blocked access to its content for the entire university.

A recent JISC, ‘Joint Information Systems Committee’  report shows that such techniques could enable researchers in UK universities to gain new knowledge that would otherwise remain undiscovered because there is just too much relevant literature for any one person to read. Such discoveries could lead to benefits for society and the economy. Current copyright law is also imposing restrictions, since text mining involves a range of computerised analytical processes which are not all readily permitted within UK intellectual property law. In order to be ‘mined’, text must be accessed, copied, analysed, annotated and related to existing information and understanding.  Even if the user has access rights to the material, making annotated copies can be illegal under current copyright law without the permission of the copyright holder. There is a real risk that we will miss discoveries that could have significant social and economic impact unless the text is freely available and unencumbered.

The restrictions placed by publishers on text mining has led campaigners to view the issue as another front in the battle to make fruits of publicly funded research work available through "open access", free at the point of use. That would allow researchers to text-mine the content freely without needing to request any extra permissions.



Freedom from patent and copyright restrictions, which are forms of private ownership will  will almost certainly unlock a tidal wave of new development which may revolutionise areas of science

No comments: