Bill C-61

June 23rd, 2008

Despite following the Bill C-61 fallout fairly closely, I have to admit that I feel fairly distant from the core of the debate. To me, the rights of creators vs. rights of consumers issue is off-target: once again we are allowing the very terms of the debate to be dictated by the dominant relations of production. This is not to say that I disagree with those who are attempting to raise awareness, those who are vehemently opposing Bill C-61, but to me this disagreement remains firmly within the established discourse. In pointing out the flaws *within* Bill C-61 we are implicitly allowing the government (plus lobbyists, corporations, etc) to set the terms of the debate. By arguing that “consumers” have the right to make use of “their” property, we are pinned once again to the very property relations which define our society. Obviously, this makes it impossible to effect any change to society. Even if we manage to get Bill C-61 amended, the government and the corporations win because they have made us play their game, the game of existing property relations, by defending our “right” to use our own property.

The existing property relations see everything human beings can possibly use as a commodity. We are used to natural resources, products, etc, being commodities, but the purpose of “copyright” is to turn cultural products into commodities.  Copyright only came into existence when books became seen as commodities rather than cultural products of a more complicated kind. Copyright now extends to cover all cultural artifacts (photographs, music, etc) - thereby turning as many aspects of human culture into commodities. It isn’t the details of the DMCA that we need to fight, but the very concept of cultural artifacts as commodities. Music has a much longer history than our current relations of production; those relations have already denied the vast majority of people access to music except as passive consumers (where previously, active participation and production of music was the norm). Rather than arguing who has the right to copy what commodified form of music, we should be demanding the right to music for ourselves.

I’m reminded of another ancient cultural item which was commodified under the British in India: Salt. Gandhi’s famous salt march showed Indians that they could reject the property relations the British had imposed on them, that they could take back an indigenous cultural artifact. The salt march remains an important symbolic - but very real - event in the history of Indian resistance to the British. Perhaps we need a salt march of our own.

The government has said they will not make a nation of criminals out of Canadians. Perhaps that is a role we should take on ourselves. Perhaps we should fight not only Bill C-61 but the entire set of property relations by downloading, openly and in great quantity, not out of acquisitiveness, but out of resistance, just as under Gandhi Indians and South Africans struck not for higher wages but to change the entire regime.

Download, be open about it, don’t pay the fine, get arrested. The more ordinary Canadians are put in jail, are unable to go to work, the more chance we have of effecting our own kind of change, not just the kind of change the lawyers and the businessmen allow us.

Coding is for Chumps

June 17th, 2008

Alright, well there’s some new Copyright legislation, obviously, but I haven’t had time to digest it all yet. Clearly, it comes down hard on the side of existing property relations and power structures, but I’ll need a bit more time to think through what it all means.

So, on to work-related stuff. In the last two weeks I’ve written three programs (scripts really; they’re too solution-specific to be called programs): one in JavaScript, one in Ruby, and one in PHP. Now I remember why I switched from Computer Science to History: it’s fun for a while, but I would never want to code for a living.

JavaScript: We wanted to do something at the University of Ottawa akin to what Ryerson developed, a little script, accessible off a bib-record page, which would take a user’s cellphone number and provider and text them the title, author, and call number. For those times when you’re on the 5th floor without a pencil or a piece of paper and need to remember the info for the book you want. Ryerson implemented theirs with ColdFusion, I did mine in straight JavaScript (more or less). The hardest part was screen-scraping the statement of responsibility and the call number out of our OPAC’s bib-display, but the good news is that I now have a couple of general-purpose JavaScript functions which can pull out just about anything from our non-well-formed, non-XHTML, dynamically-created screens. The next step was to pass this data through to a form for capturing the user’s information, and then on again to a small PHP sendmail script which actually sends the message. This isn’t one of those multifunctional SMS programs: I’m just sending an email to an address composed of phone number and known carrier domain.

Ruby: We currently have 7000 theses in PDF format that need to be batch imported into DSpace. The procedure for this is fairly straightforward: create a directory for each thesis called item_xxx. In the directory, place the PDF thesis, a contents file, and a dublin_core.xml file containing the thesis metadata. Luckily, we have complete MARC records for each thesis held in plain text files on DVDs. The only snag I ran into was that my sample MARC data would be processed fine, but the actual thesis MARC data would break my ruby library (and, it turned out, Terry Reese’s MarcEdit program as well). It looks as if there’s either a problem with the directory length in each MARC record, or with the encoding of the text file. Either way, if I use MarcEdit first to convert the file to UTF-8 and then from MARC to MARCXML it works. And, oh yeah, DSpace doesn’t use the Dublin Core DTD for its metadata files. No biggie.

PHP: This one is the killer. We’d like a web-form that takes a bibliographic citation and returns a DOI, a Persistent URL (if they exist) and/or an OpenURL. After capturing the bibliographic data in the form I passed it down to a PHP script. It was easy to create the OpenURL with our SFX prefix, and it was also easy to create - not a Persistent URL - but at least a redirectable URL to the actual article using CrossRef (one of the big DOI registries). But do you think I can retrieve the DOI from CrossRef’s XML response? I can do it in Ruby, and I can do it in PHP from the command line, but I can’t seem to do it with PHP in the browser. It doesn’t help that PHP 5 on my GeekISP server can’t use fopen() URLs (fair enough), but when I moved to our production server running PHP 4, it turns out I can’t even use simplexml. Right now the form returns the OpenURL, the CrossRef Redirect, the CrossRef XML, and a nice snazzy CrossRef logo that says “No DOI Found” - even on a test citation that exists in their database. I don’t know where to go from here with this one.

That’s about it. Stay tuned for my unmissable take on the DMCA. I’m off to have a beer and watch Holland destroy Romania while Italy and France tango.

Property and Privacy

May 27th, 2008

The other evening I was thinking about all the debate going on about Canadian copyright legislation, and I realized what my fundamental disagreement is: I’m against private property in general. Having dutifully worked my way through both the Communist Manifesto and Das Kapital (well, volume one anyway), I came to agree with the idea that property (by which Marx means “economically significant” property - not your pencil or my Nalgene bottle) must be liberated from private ownership if we ever hope to make any headway at all in our social existence. Private ownership of property (a phrase which obscures the fact that private ownership is concentrated in a very few hands) is the root of social inequality under capitalism (from the ghettoization of the poor, to the exploitation of labour by capital, etc.) Copyright is a legal protection on one form of property, and even though the property is “intellectual” (a bourgeois euphemism) I can’t be a supporter of it.

Now, I realize that there is a socially and politically responsible argument that can be made in favour of copyright, but it is only a fig-leaf. The rights of artists are not guaranteed by copyright, only the right of someone (often not even the artist) to profit by the artistic product. It musn’t be forgotten that copyright was initially introduced as a protection of profit, not of artistic endeavour. Charles Dickens works were being bootlegged in America, but he was still identified as the author of his works. In fact, the bootlegged copies of his works would not have sold as well as they did had he not been so identified. Digital music works in the same way: people don’t download “a rap track”, they download tracks by 50 Cent; they don’t download “an album that came out last week”, they download Modest Mouse’s new album. What is under attack by copyright circumvention is not the artistic endeavour, but the profit to be gained by it. If one is against corporate profit, one must be for copyright circumvention.

What about the artists? What incentive is there for them to produce. Incentive is another bourgeois conceit: for most of the world’s history, artists needed no incentive, and indeed received none. Patronage, yes; commercial interest, definitely. But to impose the idea of “incentive” is an anachronism: patronage and commercial exploitation were the context of artistic endeavour just as much as amateur production, the artistic product of the rich and leisured (what incentive did they have?). Culture is ordinary; human beings need no incentive to live culturally. And anyone who holds his or her artistic work hostage to profit can hardly be called an artist.

So much for the “property” side of “private property” - what about the “private” side. As a librarian, I am supposed to be “for” protecting privacy as much as “for” protecting copyright. But what function does privacy serve these days? I suspect that it is a meme which fundamentally protects the interests of those who own and control private property (imagine the concept of an “economically significant” fact which must be kept private/secret, as a kind of economically significant property). Most of the facts about th ordinary person does not need to be protected. This may come as a shock, but the current wave of exposure across the internet seems to support me in this. I’m tempted to say that “privacy is dead”, but like Zarathustra I would be jeered for continued to think otherwise for so long.

I have two current situations which seem to support the protection of privacy. In the first place, there has been much discussion on some of my listservs recently about recommendation engines (e.g. Amazon’s). If an engine uses too little data, privacy is compromised (in the sense of: you know exactly what I read). But so what? If you read the Marquis de Sade, or the latest JavaScript book, or Harlequin Romances, what is it to me - unless I am someone (again, e.g. Amazon) who hopes to profit financially from your choices. Remove the profit motive - the “incentive” - to exploit personal information, and we will find that we really don’t care what people know about us. In fact, we don’t care now, except insofar as that knowledge is exploited in ways we don’t like.

I spoke once in library school to a librarian/law professor about RFIDs and privacy. There was concern about people accessing information on RFIDs remotely and using that information. This law professor pointed out that no matter what information is gathered, the moment that information is used, the law can (and does) step in to regulate it. It’s not that information needs to be protected, it’s that exploitation needs to be curbed.

Which brings up the second story: http://www.boingboing.net/2008/05/26/uk-set-to-deport-mas.html

A UK Master’s student is set to be deported because he accessed Al Quaeda training materials as part of his research. He was “ratted out” by his own university. Now, there are two pieces of personal information being exploited here: the student’s race (Algerian) and the set of materials he accessed. Had the student’s file captured race (or, more likely, country of origin) and the library captured his reading history, or the university’s computer services his download history, and done nothing with that information, there would be no exploitation, nothing would have come of the information, and indeed, no one would have cared. However, now that the information has been exploited (to have the student deported) the law must step in. That the law is the instrument of deportation is another matter. My point is that the protection of private information should not be the focus, but the curbing of use and abuse of that information. Again, once we remove the need to exploit information (either for profit or power/security [e.g. the U.K. Terrorism Act], then the impetus to track that information will die out. Would we be afraid of Google tracking all our information if we knew they couldn’t use it for anything without breaking the law? Would be expect Google to be tracking our information if they couldn’t profit by it? In a world that dealt equably with everyone, Al Quaeda would be unnecessary, as would the monitoring of reading habits, and the abuse of state power. The state, one would hope, would wither away.

Hasta la revolucion siempre.

Interoperability

May 8th, 2008

Is a big scary word. At my interview for this position I was asked about it and realized that, although I could mention certain things (”z39.50!”) I really didn’t know what I was taking about. Having been to Code4Lib this year, I now realize that “interoperability” is really what technology and librarianship is all about.

As a simple, and no doubt, crude example, think of a reference interview. During the course of the reference interview, you ask for information from your patron, reformat/refactor the results of your question, perhaps check the catalogue or a database, and return more information to your patron. This is interoperability between a user and one or many data sources (you, your catalogue, your database) and a controller (you as a reference librarian).

Now, we have data sources all over the place. Every ILS, every database, every website, every journal is a mine or silo of information. In the “information age” all this information is stored electronically, in a multiplicity of formats. Now, under Web 1.0, our users were human beings - it’s very easy for a human being to be presented with many different formats of data and adapt to them all, more quickly, easily, and efficiently than a computer. But under Web 2.0 our users (or user-agents) are other computer programs, and they don’t cope so well with many different data formats. A human user can also adjust for “bad” data (think of typos, bad punctuation, missing pages, etc.) but a programmed user-agent can’t. Interoperability grew out of the desire (and need) for the vast warehouses of data stored by libraries to be made accessible to programmed user-agents. To achieve this, most of our work has fallen into two categories: data format and information retrieval.

Data Format

AACR2 was designed to be human readable (e.g. on an catalogue card); MARC was designed as a way to transfer bibliographic data electronically without loss. Neither of these formats are good for computer interoperability. MARCXML might be a good idea, but retains a lot of the shortcomings of MARC. Dublin Core was an interesting idea, but seems to be too lightweight (i.e. doesn’t provide the richness we’d like to see in our metadata). MODS is an XML format which seems to be less cumbersome than MARC but richer than Dublin Core. Then there are all the other (primarily XML) formats out there.

Information Retrieval

We’ve achieved this through different kinds of protocols, all of which can be scripted and plugged into user-interfaces. OAI-PMH is a client/server model for “harvesting” (batch retrieval) of data, rather than search. SRW/SRU are “lightweight” methods to search a server by putting query data in the URL. Opensearch - well, I’ve just been reading a debate about the relative merits of Opensearch and SRU.

Why Should We Care?

You know how you can find your house in Google Maps, then locate all the pizza places within three blocks? They did that because in the real Web 2.0 world, interoperability has always been the watchword. Standard protocols, standard XML schemas, these all make mashups that much easier. If we could unlock the data stored in antiquated, difficult-to-use, non-standard formats in our library systems, the information might be of some day-to-day use, which is one of the things librarians usually say they want.

What Can You Do?

Create API interfaces for any digital library, institutional repository project, etc, that your organization develops. Ensure that standard data formats are used wherever possible. Create “mashups” of your own, using the protocols outlined above, and the libraries for your favourite scripting language, if only as proof-of-concept. A Web API can be created quickly and easily in Rails, and simple REST, SOAP, and XML-RPC clients to use OAI-PMH, SRU, and OpenSearch can be created with ease and flexibility in, for example, ruby.

One Last Thing

As new librarians, I think some of us overestimate MARC. I was playing around with ruby scripts to try to read MARC data out of files and store the records in a MySQL database. I kept get screwed up by the fact that a MARC RDBMS schema seemed necessary, but ugly. The fact that I couldn’t find an example of one on the web indicated that other people weren’t concerned with a MARC database schema. Eventually I emailed Terry Reese, developer of MarcEdit, and asked him the score. His courteous response can be summed up as follows:

  • MARC is a transmission format, and was never intended as a data storage format
  • What library programmers tend to do when faced with collections of MARC data, is to strip the data out into logical, clear fields of any flavour (Dublin Core, perhaps, or ISBD). This is “normalizing” the data
  • The resultant “clean” data can then be indexed, searched, formatted, converted to XML, retransmitted, or whatever
  • The MARC record can be stored as a “blob” (unstructured data) in a database field, in case the MARC information is required again (for instance, if an ILS transforms MARC into an OPAC display

And Finally

I was going to write a blog post about this, but I couldn’t do it without being abusive. What’s with the hippy-dippy, emo, flower-power, Cat Stevens aesthetic that seems to be such an important part of the library world. Passion quilt? There is only one you? This reminds me of inspirational posters that have been mocked for so many years on the web. Vague, pointless, and distinctly unhelpful. Could be worse, I guess: could be an Elvis on Black Velvet.

Top Technologies

April 22nd, 2008

April 14 marked three months in my position as Emerging Technologies librarian at the University of Ottawa and I think I’ve managed to identify the technologies which (while not necessarily still emerging) are necessary for the job. Here goes:

1. OPAC: we never got our hands dirty in library school, but understanding the different vendors and products out there is essential. Understanding a bit of MARC and a bit of information retrieval (indexing and searching, etc) is helpful here.

2. Scripting: Ruby, Javascript, Python, PHP: When I started programming, my friend and I decided to learn C/C++. Never mind that this might have been biting off more than we could chew, it stood us both in good stead when we went to university and had to learn Pascal. But actual compiled languages aren’t quick and dirty enough for the work I’m doing. Last year I picked up a bit of PHP, but I like Ruby quite a bit server-side, and Javascript client-side (as in our OPAC pages); Python is going to be necessary for maintaining the CMS we’re working on.

3. CMS: I’m going to expand the idea of CMS to include DSpace, because that’s the other project we’re working on. The core of our DSpace install is going to the the electronic thesis repository, but in essence, the workflow within DSpace is such that it qualifies (to me anyway) as a CMS, just with less emphasis on access. The CMS we are going with is Plone, based on Zope, which uses Python for internal scripting. Eventually our website, our intranet, and our library subject pages will be on Plone.

4. Databases: Always. Somebody identified SQL as the HTML of Web 2.0, and while it’s probably more accurate to say that about XML, the principle still applies. XML files can be viewed as non-relational (or single-table) databases; and collections of XML documents as multi-dimensional (though still not relational) databases. This is how SOLR sees its collections.

5. Interoperability: mostly between databases and either human-readable or programmable interfaces (web page vs. web service), mediated by scripting. The Model-View-Controller paradigm that I learned from Rails applies to almost all library or library related web applications. The database (sql or xml, solr or ILS) is the model; ruby, php, python, or javascript constitute the controller, and html or xml is the view. Each should be independent of the other.

6. Web services: both providing them in the form of a supported API (Ruby on Rails is a quick and easy way to get a web service up and running) and consuming (using scripting and HTML). Web services provide the basis not only for most web 2.0 technologies (mashups, etc.) but will be one of the foundation stones of the Semantic Web, whenever that gets going. Technology librarians needs to be able to quickly and simply write scripts to consume web service APIs, such as WorldCat’s grid services, Google Book Search, and any API provided by their ILS vendor. Being able to script web service consumption allows you to take advantage of very flexible z39.50 (now pretty much obsolete for any interesting use) and SRW/SRU.

I’ll try to write a bit more in the next few days about each of these things in more detail. We’ll see how that goes.

RedLibrarian 4.0

April 21st, 2008

Hello everyone. I believe this is the fourth incarnation of my Red Librarian blog, and for those of you who weren’t around back in 2005 when I started this venture, the three previous incarnations are as follows:

1. Microsoft FrontPage: Fresh out of our Applications for Information Management course I put up a crappy web-page with a theme and some tables. Draw-backs: only editable from my laptop; all content had to be uploaded to the Red Librarian server.

2. Blogger: Went with blogger for the flexibility.

3. Serendipity: Which Dan Scott uses over at CoffeeCode I liked the added functionality of his page, so I thought I would use that.

A while ago I returned to blogger just for the hell of it.

Anyway, I’m hoping to start updating this blog a little more often, but unfortunately for all you non-systems people out there, this will probably entail more systems-related posts. I’ll do my best to tag everything properly, so that you can skip the boring bits.

That’s it. Feedback on the theme would be appreciated; I’m not hugely enamoured of it, so I’ll keep browsing the WP theme area in case something takes my fancy. Hopefully this summer I’ll be in the position to create a decent theme of my own.

Poka.