Photo Credit: Unsplash/ Patrick Tomasso
With so much research getting published every day all over the world, a super-smart search engine has become essential to help parse through seemingly endless scores of academic papers. Faced with the challenge, a technologist has found a way to unlock the world's research papers for easier computerised analysis. He has released an index of some 107.2 million journal articles online, including many paywalled research papers, totaling 38TB of data in its uncompressed form.
The General Index, created by American archivist Carl Malamud, was released on October 7 and is free to use. The index holds over 355 billion sentence fragments and words listed next to articles in which they appear. “It is an effort to help scientists use software to glean insights from published work even if they have no legal access to the underlying papers,” Malamud told Nature journal.
The primary objective of this index is to help with text mining, a process of using computers to quickly scan millions of data points to find references to something specific. Humans can't possibly read data from millions of journal articles, but a computer programme connected to the General Index can.
A set of researchers, who have had early access to the index, termed it as a big development. Gitanjali Yadav, a computational biologist at the University of Cambridge, UK, who studies volatile organic compounds emitted by plants, said this index will help researchers in accessing many research papers that already existed but were previously lost somewhere. Researchers were earlier restricted to mining only open-access papers or those that they had subscribed to. But this index will be of great help to them.
Malamud said his index contains only snippets up to five words long, so releasing it does not breach publishers' copyright restrictions.