Data+Mining+for+Libraries

= Data Mining for Libraries =

@Data mining has been defined in a number of ways. A comprehensive and accurate definition comes from Palace (1996):

toc "Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases."

media type="youtube" key="wqpMyQMi0to" height="315" width="560"

Improving library services
Libraries want to know how their search engines and other services are being used but typically are limited in their ability to do so. For example, Smith (2005) said that he had Unicorn software to track searches on the library web site, but it could not track anything in OPAC searches. However, when he replaced Unicorn with SirsiDynix Director’s Station, that all changed.

“Once users left our Web page and got to the OPAC I didn’t have a clue [what kind of searches they were doing]. With Director’s Station I can tell you for any time period how many searches were done, how many hits, the average hits per search, and also the number of sessions. I can break that down into browse or keyword. From there I can tell you how many author or title browses were done. I can tell you what kinds of keyword searches were done: author, title, subject, serial, or general. I can go a step further and tell you exactly which authors, subjects, titles, and general keywords were searched for” (Smith).

In addition, user log entries and check-out records became much easier to track and store with SDDS. Time needed for mandatory annual reports was reduced from one week to half an hour (Smith). Of course the data mining system can only track registered users, but anyone with a library card is a registered user.

Lawson (2009) identified social tagging as a data mining source for subject categories. In her studies, she found that there are typically both objective and subjective tagging. Subjective tags would be difficult to incorporate into a library’s subject categories. For most categories studied, however, there were enough objective tags to consider adding them to the controlled vocabulary of libraries (p. 578). As more user tags are added, patrons will be able to find material they never would have with our strict subject definitions.

With all of this information available to us, many possibilities are open. It can see which services are being used the most, how they are being used, and by which demographic group(s). Kovacevic, Devedzic, & Pocajt (2010) did a formal study on how libraries can make use of data mining techniques to improve services. Based on the results, in which they found “overall accuracy is satisfactory,” they said, “In the near future, we plan to add an effective visual representation for recommending specific services to the users and to discover most significant problems that the users encounter by using text mining techniques to analyze the users’ e-mails or free text interviews” (p. 840).

Privacy
With all of this information and advanced analysis available now, how will data mining affect librarians' professional commitment to user privacy? Kovacevic, Devedzic, & Pocajt (2010) addressed this by saying, “To protect the users’ privacy, we replaced their names and institutions with codes. Hence, only authorized individuals will be aware of the users’ identity and the institution he/she is with” (p. 834). This is a good security measure, but is it enough? In order for these measures to work

Despite any privacy concerns, libraries may have to move forward with data mining systems, not just to provide better service but to justify our existence. Recent economic woes have put more pressure on libraries to fight for their budgets, or even to keep from being eliminated altogether. We need more comprehensive information to make that argument. As Smith (2005) says,

"Statistics based on the ALA Output Measures just don’t go far enough any longer….We need to be able to take all that wealth of information on our servers and translate it into useful elements for setting trends and analyzing our position. We need the ability to mine the data and present it to all, library administration, boards, funders, and the public. We need that information to plan for the future" (Smith).

**References**
Kovacevic, A., Devedzic, V., & Pocajt, V. (2010). Using data mining to improve digital library services. //The Electronic Library 28//(6), pp. 829-43.

Lawson, K. G. (2009). Mining social tagging data for enhanced subject access for readers and researchers. //The Journal of Academic Librarianship 35//(6), pp. 574-582.

Palace, B. (1996). Data mining: What is data mining?

Smith, G. (2005, December). Is there data mining for libraries? //SirsiDynix OneSource Monthly e-Newsletter for the Worldwide SirsiDynix Community 1//(4).