1. Definitions of Big data: As Anderson noted in 2008, weare living in “the petabyte age”; this is “the era of big data, where moreisn’t just more. More is different.”Big data is a term that has yet to be operationally defined;however, De Mauro, Greco and Grimmaldi have a solid foundation for the basis ofthe word. Through the analysis of 1,437 articles that discuss big data, wordclouds with four centralized themes were identified. The four emergent themesinclude information, technology, methods and impact. Big data was therefore defined bythese authors as “the Information asset characterized by such a High Volume,Velocity, and Variety to require specific Technology and Analytical Methods forits transformation into value”.
Ed Dumbill, the editor-in-chiefof a journal devoted to the topic of big data, offered this broader, more conceptualdefinition: Big data is data that exceeds the processing capacity ofconventional database systems. The data is too big, moves too fast, or doesn’tfit the strictures of your database architectures. To gain value from thisdata, you must choose an alternative way to process it.2. Characteristics of Big data: 2.1. Volume: Big data first andforemost has to be “big,” and size in this case is measured as volume.
Itimplies enormous volumes of data and generated by human interactions, machines,cloud computing, datasets, and networks. Whether a particular data can actually be considered as aBig Data or not, is dependent upon volume of data. 2.2.
Variety: Variety refers toheterogeneous sources and the nature of data, both structured and unstructured.Data in the form of emails, photos, videos, monitoring devices, PDFs, audio,etc. is also being considered in the analysis applications. Similarly librariescontain data of e-resources, digital libraries, institutional repositories, cataloguingdata, courseware, contents of periodicals and archives etc. 2.
3. Velocity:It refers to thespeed of generation of data. How fast the data is generated and processed tomeet the demands, determines real potential in the data. The data flows in from different sources likebusiness processes, machines, networks and human interactions through socialmedia sites, mobile devices etc. Velocity in the context of big data refers to two relatedconcepts familiar to anyone: the rapidly increasing speed at which new data isbeing created by technological advances, and the corresponding need for thatdata need to be digested and analyzed in near real-time. 2.
4. Veracity:Finally, veracityis to emphasize the importance of ensuring the reliability and integrity ofdata. Big Data It’s not just about data quality and it includes dataunderstandability.
Veracity refers to biases, noise and abnormality in data isthe data that is being stored and mined meaningful to the problem beinganalysed. 2.5. Volatility:Big data volatility refers tohow long a data valid and how long should it be stored. It his world of realtime data one need to determine at what point is data no longer relevant to thecurrent analysis. Volatility is a moreappropriate trait to describe the characteristics of Big Data analytics becauseit puts the emphasis on how quickly the data changes.
3. Big data and Libraries: The growths of digital resources tomanage large amounts of data enable Big data that libraries can deploy quickly.”Digitizationinitiatives, scientific research and the social web have created a situation inwhich scholars are now gaining access to huge quantities of data on anunprecedented scale, leading to Big data. Libraryand information professionals have a potential role to play in using Big datato help their patrons and they areuniquely suited to work with big data. Libraries have a long tradition of beingearly technology adopters, and big data should be no exception. (Huwe, 2014).
Thereare several ways that librarians can get involved in creating big data fortheir libraries and users located at different places. One of these is throughcollection development and preservation of data sets. As library users becomeinterested in using the information from big data, they will need guidance andmaterial to work with. Librarians are well-positioned to help users understandhow and where to find these data sets and to preserve them for future users. Another way librarians can getinvolved with big data is by working within their academic and research institutionsto assist with research data management.
Researchers need assistance with data management,especially because many funding agencies now have very specific data retentionguidelines that must be followed. Librarianscan “help researchers, even in the planning phases of their projects, toappraise and think about the archival and preservation options for their data,as well as the potential for sharing their data.” (Creamer, 2014). Librariansalso need to act as a voice for balance and reason within their organizations.While big data analytics has many interesting possibilities that should be explored,there is no substitute for more traditional methods of research.
Libraries have amassed anenormous amount of machine-readable data about library collections, bothphysical and in electronic format, over the last five decades. However, thisdata is currently in proprietary formats understood only by the librarycommunity and is not easily reusable with other data centers or across the Web.This has imposed that organizations like OCLC and major libraries around theworld begin the work of exposing this data in ways that make itmachine-accessible beyond the library world using commonly accepted standards. Like any other resource or researchmethod, librarians will need to help their patrons understand what big data canand cannot do, and how it can best be used to achieve their research goals. Librariansaround the world have had initial success in offering services to supportresearchers’ data related needs, and these needs will likely only increase astime goes on. Increasingly, funders and journals are requiring data sharing andthe submission of data management plans; even if they don’t already,researchers in many fields should expect that they will be required to complywith such policies within the next few years. As a result of such policies, theamount of freely- and publicly-available research data continues to increaseexponentially. Researchers will likely need assistance in learning how toaccess and utilize these datasets.
Datascience and data-intensive research are frequently team-based andinterdisciplinary, relying on the varied expertise of many differentcollaborators. Librarians can potentially play an important role in suchresearch, bringing expertise in information and knowledge management. At a timewhen many libraries face challenges to funding and must demonstrate theircontinued relevance, expanding services to encompass research data managementand other data science services may be a useful way to provide novel types ofsupport to their user communities.
4. Conclusion:While the capacities of BigData are simply being understood, its conceivable outcomes have caught theconsideration of the library and information science field. With regards to scientificresearch, librarians can fill an administration hole by authorizing guidelinesand best practices and giving direction on the formation of Big Data.
Another possiblepurpose of coordinated effort is library professionals’ capacity to makereliable information vaults. The multiplication of information in examine arewithout a doubt impacting the data calling and giving vocation openings. Librarianswill meet users’ high expectations, as to all others, by adjusting to newinnovation and remaining side by side of the most recent patterns in look into.