Sunday, November 15, 2009

When all you have is the reaction...

I'm one of those people who has about 9 different email accounts at any given time, one of which is gmail. Yesterday I discovered about 50 legitimate emails had been diverted into my spambox this week, for no apparent reason. I'm guessing that there was some kind of spam-bomb that went off and pushed their filters way up the scale for a day or two. Up until this point it's been one of the most reliable and efficient spam filter's I've used.

As a result I've been thinking about Google a lot the past few days. There's the new Verizon Motorola Android phone on the market. There's the Google Books settlement that just came out:

http://news.cnet.com/8301-1023_3-10397787-93.html?part=rss&subj=news&tag=2547-1_3-0-20

And an interview with their CEO talking about Google's successes and challenges:

http://news.cnet.com/8301-30966_3-10396865-262.html?tag=rtcol;inTheNewsNow

I think the most interesting aspect of this interview is the problem of scale that they are up against. I've been noticing recently that the quality of searches have started to suffer, as they struggle (or fail) to keep up with the new media and blogosphere.

Once again the scale and structure of the internet is changing, and they may have to revisit a number of the assumptions that they build their search engine upon. Which leads me to the question of "how do you plan for fluidity?" How do you plan for a system that is dynamically alive and changing at a pace that only seems to accelerate?

I don't know if anyone has a really good answer for these questions, but I do think the world of computers has some solutions in the works, namely, what I like to call abstraction.

When I was learning to write a simple web application in PHP last spring, the first thing I did was write a bunch of 'classes' that would define objects. Those classes called the database and fetched the data, handling it in the terms defined. All of the PHP I wrote after that called on the classes, calling these 'objects' rather than calling directly to the database.

The beauty and elegance of this is that (in theory) you can change the structure of the database radically, and you only need to change the affected classes. The rest of the code can run, virtually unchanged, on those modified classes. I like to think of it as a type of data abstraction, where the code calls the database via an abstracted, a mediated channel, rather than the database itself.

This is, however, not unique to PHP and web applications. As I understand it, all object oriented programming languages function similarly, defining and calling objects, whether the definition is called a class or a library.

This is, again, not unique to object oriented programming. All of the Linux machines I've used rely on what's called a 'hardware abstraction layer'. Basically the operating system calls this abstraction layer and the abstraction layer acts to communicate with the hardware. One of the big problems with talking to the hardware directly is that if it fails to respond, the system hangs, freezes or crashes. So rather than writing a specific response to every type of possible failure, they rely on the hardware abstraction layer (HAL) which will communicate back to the operating system if the hardware fails to respond or perform normally. Furthermore, if the hardware changes, the operating system doesn't have to change the way it calls the hardware. In my opinion this is one of the main reasons that Linux has made it it out of the geek pit and into the playing field. Prior to this Linux was synonymous with 'hardware configuration nightmares'.

Which leads us back to the question, are there new ways that we can abstract our library functions, our information systems, such that the structures can change without having to create whole new systems? How will we afford flexibility even in our abstractions such that they too can change?

Saturday, November 7, 2009

Standing in the Temple of Interoperability and Extensibility

My research group likes to share links of interesting articles. This week the basket included:

Think Tank Stresses Importance of Information Sharing in Research and Teaching

Tim Berners-Lee: Machine-readable Web still a ways off

All of the links this week really get at the heart of a widespread need for interoperability, extensibility, and some standards for machine readable contextuality.

I find it both really fascinating and totally counterintertuitive that standards (when used properly) promote the creative (and unpredictable) expansion of the net, as they allow interoperability, sharing, and reduce the duplication of effort. I don't think that most people understand the degree to which they provide the substrate for the network to communicate. Because the network is so distributed and largely uncontrolled, it provides the power to aggregate across systems. That is, however, the problem as well.

However, in the history of solutions in this arena, from html to packet switching to email, you see a pattern of innovation that is both simple, efficient and elegant. The need for semantic structure in the web is building, across institutions and across disciplines, and I'm willing to bet that we'll see a solution start to take hold in the next few years. The pressure is building, and the dam will break. As such, I would argue that we are in one of those important historical moments where change is about to form itself before our eyes.

Since I've had too much coffee and not enough procrastination, I'd like to indulge in theorizing about what innovation matrix may come out to solve this problem. I say matrix because it's such a complex systems and sociological problem. I don't think any one technology or innovation is going to get us from here to wholesale adoption of something like the RDF.

There are, however, some assumptions to predicting evolution. We have to assume that the nature of the system won't radically change, and that the assumptions that have given rise to the current system are not fundamentally flawed.

That being said, there are regularities to technological change. The innovation has to be simple enough to be distributed and elegant enough to be adopted. It has to be something that's obvious from the technology in hindsight. I see the following principles implicated:

  1. It has to be a simple step from the current technology to the innovation. For example, twitter is an obvious innovation if you look at blogging culture, sms culture and and the adoption of mobile web applications into everyday life. It's an extensible platform that can be run on the most basic of web enabled phones, allowing blogging impulses to happen away from the desk and without the strange and tiered costs of sms.
  2. It has to be based on obvious and available reconfiguration of current technologies, but remixed in a way that isn't inherently obvious, or it would have been done already. In other words, it needs to take advantage of the current strengths of the system. It's not going to be something built in a free standing vacuum to address this problem.
  3. There needs to be either a low barrier or a high incentive for adoption. In other words, it has to be either easy or rewarding, which is a complex sociological calculation in itself. Regardless of the complexity of estimating this variable, there is a strong underlying argument for ease and simplicity within this factor. Basically, it needs to be cheap to adopt.
  4. There is also a strong argument for a multifaceted rewards system. In other words, it needs to maximize the number of groups and possible ways that it can benefit the system. If it only benefits a small group, it probably won't be adopted, regardless of the size of the benefit. In order to get a large enough net benefit, we might need to aim for a solution that yields a smaller benefit to a larger number of users.
  5. It needs to be injected into a place where it can take root in a sizeable portion of the population of online documents. It doesn't matter how simple, elegant or functional it is if no one is using it. One word for this problem: Betamax. Size matters when it comes to adoption and market share. The most successful path for the retroactive injection of structure system is probably at the lowest attainable level.
  6. Centralization is tenuous in an online environment, the only examples that work are ones that are highly open and interoperable such that the content can move fluidly between systems, or where demand is high.
  7. There is a strong generalized need for semantic structure for the web, but virtually no focused demand, because of a positivistic myopia. The deficiency of the system isn't immediately visible even the most proficient users. They don't know what the searches aren't returning to them. We are trained to optimize our thinking within the current structure. It's a nebulous problem without clear solutions or villains. That's what makes it an interesting problem to watch, because it means that the crystalization of the solution will be somewhat unpredictable, but when it does crystalize it will precipitate a whole new iteration.
  8. The solution must be extensible. The system is a moving target, and must either grow with or shed existing structures. Any successful solution must provide for both the present and the future. We must be able to add in things to it in the future that we cannot conceive of now, because of the nature of system.

Because of these factores, I don't think a centralized repository of metadata (semantic) information is feasible. The distributed, layperson-oriented and uncontrolled nature of the internet precludes a number of avenues of innovation.

However, I think the solutions will look mundane and yet powerful. Below are som that might pass through most of these limiting factors.

1. Embedding more semantic/metadata information within the hyperlinking html definition.

Why this would work:
It wouldn't be final solution. It doesn't provide the extensive benefits that the RDF would, but could bridge the problem significantly with distributed labor, low cost, low overhead, integration into the structure of the internet and distributed benefits.

It would only begin to address the problems of non-html documents on the web. In the end, the metadata really needs to be emdedded in the document such that if you have the document you have the metadata.


2. Creating a standardized file wrapper that embeds the metadata. Just like the embedded metadata of mp3s, and the wrapping of digital video files to encompass multiple encoding types within a single file type. These two solutions could be extended to the entire world of online files.

This would require a bunch of details to be successful:
  • The standards would need a library of definitions hosted centrally, just like for document type definitions within html/xhtml. This is so that they can be extensible, grow over time, and embed only the minimal amount of information in the file itself.
  • The metadata should also include document versioning information. There is such a need for embedded versioning information that this alone could drive adoption rates. Imagine if every document you emailed for review could have an edit time date/stamps with every user included, regardless of which platform you or they used. The technology is simple and at our fingertips, think of it as email headers for web files. Semantic file management problems are not restricted to the internet. We need the next generation of embedded file information in a global way.
  • The wrapper would need to be based on a simple open architecture, such as xml.
  • The wrapper would require broad spectrum buy in from the software and gadget community. It would need to be designed by and supported by software manufacturers such that it could be automatically generated when saving the file. If Microsoft and Adobe supported this endeavour, I believe that virtually all filemakers would follow suit. It's not inconceivable to think that they would do this, and do this openly, given the history of the development of the pdf and the ms office document types. Google, microsoft, and yahoo all have a great deal to gain having a voice at the table with regards to increased semantic structure of documents.
  • Support would need to be such that users could save, and modify the wrapper information on a variety of platforms and applications, preferably on the fly without actually opening the document. Browers could be adapted such that the wrapper information is optionally editable upon download of a file, (just like the filename and download location). Browsers could then also take contextual information from the website to pre-populate fields. Just as citation management software can recognize citations, browser and application plugins could be written to read and populate metadata fields on the fly while saving documents.
  • There could be scripts written to automate the wrapping of current document repositories on a large scale. Arguments could be made for publishers to do this so that their files can be tracked for copyright purposes.
This proposition is certainly more ambitious, but then again the idea of the semantic web is highly ambitious. I would also argue that it is far more attainable/negotiable than the direct implementation of the RDF (because so few people can do anything in xml). Furthermore, it seems like this would support the implementation of the RDF, I believe, as it could be one of the metadata types supported, complementing everything that has been developed so far.

And I'm sure this hypothesizing shows my gaps of knowledge as much as anything. But you have to admit that it's an intriguing issue to think about, regardless of how things fall out. And while this may not be the solution that takes hold, it is likely that it will be something in this vein.

Tuesday, October 27, 2009

WorldCat Identities

I'm working on a paper for one of my classes and in the process I stumbled upon this beta 'identities' site from WorldCat. While I've come down on both sides of WorldCat from time to time, I think this feature for aggregating information about a single author is pretty innovative and useful. Here's an example:

http://worldcat.org/identities/lccn-n83-71663

Wednesday, October 14, 2009

Zotero v EndNote

I'm a Zotero user, I won't dice any words about that. After working with RefWorks, EndNote, and Zotero, I was won over pretty easily. It was that thing which I had been holding out for, as I had more or less managed without any citation management tools for years at a time. Which is why I was pleasantly surprised to read this:

National Science Foundation Hires Zotero

We are delighted to announce that the National Science Foundation (NSF) Engineering Research Centers Program in the Division of Engineering Education and Centers has hired the Center for History and New Media (CHNM) to provide a customized interface for NSF’s internal use. NSF had already been using Zotero for some time, and based on positive experience with the software, NSF contracted with CHNM to extend Zotero to meet the organization’s needs more fully.

Ironically, as part of working on a NSF grant, I may have to move the relevant portions of my library over to EndNote, the program users love to hate. This may well be how things move full circle in the world of large bureaucracies...

Thursday, August 20, 2009

Computery Goodness

I've been tinkering with multiple operating systems on my laptop. This is nothing new, but rather something I've been doing for going on four years now, with various OS's and laptops. Today, however, I achieved a new level of geekiness.

Today I restored my installation of Windows XP. Rather than reinstalling and reconfiguring the twenty-some-odd programs I use on a daily basis, I did the following:

1. Made a copy of the entire windows xp partition to an external harddrive (while in Linux, because Windows will get sticky about certain system files)
2. Reformatted the original harddrive, creating an empty partition for Windows to reside in, and installing the operating systems that required the reformatting of the hard disk.
3. Copied the entire windows partition back onto the blank partition.
4. Used my original windows installation disk to go into 'repair' mode to execute the following command: 'fixboot'
5. Voila, my windows xp works exactly as it did yesterday before I made the backup, in a fraction of the time it would take to reinstall everything.

But I bet right now you're wondering what sort of practical use this has for your life. Well, should you be one of the masses using Windows XP in your personal life, you can easily download a thumb drive version of linux (such as Puppy), install it on an old flash drive you have lying around, and then use it to make a backup of your entire hard drive. Then, should you ever get a virus, have a hard drive failure, or whatnot, you can have a new disk up and running with all of your old programs installed in a fraction of the time.

And believe me, hard drive failures happen to everyone. You don't want to spend your days looking for spare hard drive parts on ebay so that you can squeek one last read out of a dead harddrive.

Monday, August 10, 2009

Living in the City of Beautiful People

This fall I'll be starting a doctoral program at UCLA. Aside from the obvious issues of graduate study, moving to LA has been full of other things as well. For instance, our apartment falls pretty squarely between West Hollywood and Hollywood proper. If we walk four blocks south, we find ourselves in a transitional neighborhood of aging Russian Jews and young, fashionable gays. If we walk four blocks east we are surrounded by tourists and highly manicured fashionistas trying to make it in the entertainment industry. Every day I feel like I'm in some sort of strange movie set, and then I realize that I am.

Then there are the little things that make life so colorful. Like not having power for 5 days. I guess the previous tenant was enough of a deadbeat that I had to show up in person at the LA Dept. of Water and Power with a copy of my lease and photo ID. It wouldn't be such a hassle if they were open on the weekends...

But being a technophile without power leads to the question of what do I do with myself when the technology fails me. I like to listen to music, watch movies, cook, listen to the radio, read, all of which become more complicated with no power. No fans, no refrigerator, no stereo, no lights, no wasteful hours of internet surfing, but then again, I can't remember the last time I ordered take-out for a candle-lit dinner, so perhaps I shouldn't complain.

Thursday, July 9, 2009

Link of the Day - Dial a Poet

In continuing the theme of politics and poetics, here's a tidbit from a neat archive on art and music:

(Dial-a-Poem was a sort of answering machine system where people like John Giorno, Alan Ginsberg, and Patti Smith recorded poetry for people to call into. Perhaps it could be considered an odd precursor to the podcast)


DIAL-A-POEM HYPE


One day a New York mother saw her 12-year-old son with two friends listening to the telephone and giggleing. She grabbed the phone from them and what she heard freaked her out. This was when Dial-A-Poem was at The Architectural League of New York with worldwide media coverage, and Junior Scholastic Magazine had just done an article and listening to Dial-A-Poem was homework in New York City Public Schools. It was also at a time when I was putting out a lot of erotic poetry, like Jim Carroll's pornographic "Basketball Diaries," so it became hip for the teenies to call. The mother and other reactionary members of the community started hassling us, and The Board of Education put presssure on the Telephone Company and there were hassles and more hassles and they cut us off. Ken Dewey and the New York State Council on The Arts were our champions, and the heavy lawyers threatened The Telephone Company with a lawsuit and we were instantly on again. Soon after our funds were cut, and we couldn't pay the telephone bill so it ended.


Then we moved to The Museum of Modern Art, where one half the content of Dial-A-Poem was politically radical poetry At the time, with the war and repression and everything, we thought this was a good way for the Movement to reach people. TIME magazine picked up on how you could call David and Nelson Rockefeller's museum and learn how to build a bomb. This was when the Weathermen were bombing New York office buildings. TIME ran the piece on The Nation page, next to the photo of a dead cop shot talking on the telephone in Philadelphia. However, Bobby Seale, Eldridge Cleaver and The Black Panthers were well represented. This coupled with rag publicity really freaked the Trustees of the museum and members resigned and thousands complained and the FBI arrived one morning to investigate. The Musuem of Modern Art is a warehouse of the plunder and rip off for the Rockefeller family and they got upset at being in the situation of supporting a system that would self-destruct or self purify, so they ordered the system shut down. John Hightower, MOMA Director, was our champion with some heavy changes of conscience, and he wouldn't let them silence us, for a short while. Then later John Hightower was fired from MOMA and Ken Dewey recently flying alone in a small plane crashed and died.

In the middle of the Dial-A-Poem experience wqas the giant self-consuming media machine choosing you as some of its food, which also lets you get your hands on the controls because you've made a new system of communicating poetry. The newspaper, magazine, TV and radio coverage had the effect of making everyone want to call the Dial-A-Poem. We got up to the maximum limit of the equipment and stayed there. 60,000 calls a week and it was totally great. The busiest time was 9 AM to 5 PM, so one figured that all those people sitting at desks in New York office buildings spend a lot of time on the telephone, then the second busiest time was 8:30 PM to 11:30 PM was the after-dinner crowd, then the California calls and those tripping on acid or couldn't sleep 2 AM to 6 PM. So using an existing communications system we established a new poet-audience relationship.

Dial-A-Poem began at the Architectural League of New York in January 1969 with 10 telephone lines and ran for 5 months, during which time 1,112,337 calls were received. It continuted at MOMA in July 1970 with 12 telephone lines and ran for 2 and a half months and 200,087 calls were received. It was at The Musuem of Contemporary Art, Chicago for 6 weeks in November 1969 and since then has cropped up everywhere. This was with equipment working at maximum capacity and sometimes jamming the entire exchange. At MOMA, the 12 lines were each connected to an automatic answering set, which holds a pre-recorded message. Someone calling got randomly one of 12 different poems, which were changed daily. There were around 700 selections of 55 poets.

On this LP of Dial-A-Poem Poets are 27 poets. The records are a selection of highlights of poetry that spontaneously grew over 20 years from 1953 to 1972, mostly in America, representing many aspects and different approaches to dealing with words and sound. The poets are from the New York School, Bolinas and West Coast Schools, Concrete Poetry, Beat Poetry, Black Poetry and Movement Poetry.

John Giorno, August 1972


MP3s recordings can be found here:

http://www.ubu.com/sound/gps.html

Wednesday, July 8, 2009

Link of the Day:





http://twitter.com/QueensSpeech

International LGBT news snippets, based out of London.

Although I'm kinda bummed, because their regular website (http://www.queensspeech.com) is down with a db error and it's the second site in a row I've found today with database errors. Did MySQL go on strike today?

Monday, February 16, 2009

Just because the technology is there...

Last week I attended a panel of library directors giving tips to library students on entering the job market. They discussed resumes and interviews and gave the perspective of the people on the other side of the table. By and large, it all the things I've heard dozens of times already. But there were a few points where I was both shocked and appalled by the discussion.

The first thing that stuck in my throat was the comment from a male academic library director who stated:

"When you're going on the job market, manage your online brand of yourself, because I'm going to google you, and if I don't like what I see, then you're out."

I've heard this statement before in various contexts, so it was not unexpected or unheard of, but I still couldn't quite swallow it whole. I had a strong gut reaction to the question. Initially I felt that looking on sites such as facebook or myspace to make hiring decisions would constitute an invasion of privacy. These social sites are almost entirely unrelated to professional activities and are the very sites that are blocked in many public libraries and federal agencies. It didn't sink in as to why it was bothering me so much until some other students brought up the issue of how to handle discriminatory interview questions in terms of being females in their child bearing years. Here was a untraceable way for a manager to engage in that kind of discrimination without having to disclose about it.

I believe in a sharp distinction between my professional and personal lives. What I do at work I would like to leave as work, and what I do at home I would like to leave as my home life. However, because I have worked in LGBT archives and history projects, I've gotten some bleed over between the two, especially in terms of my 'internet branding of myself'. When I worked at the LGBT Resource Center at the University of California, Davis, the director of the center was perfectly candid in warning me about potential hiring descrimination if I was listing the job experience on my resume. It was general practice to suggest to all the employees that as an alternative to listing the LGBT Center as my employer, I could always list our department of the Cross Cultural Center instead. However, as a result of this employment, I'm still cited in various LGBT higher education websites as being a resource or a contributor. There's no hiding that I was playing on the gay team. But why should I be ashamed to have worked in a well paying, legitimate, and complex work environment within student affairs at a Tier I Research Institution?

There is also the problem that with every passing year, there is more bleed over from the physical to the digital. Every year it gets easier to carry over ephemera from the physical world into the digital one. Every year there are more pictures of pride parades, more pictures of political rallies, more pictures of nightlife posted on the internet. I see more and more of people's personal lives becoming accessible on places like facebook and flickr as their lives become integrated with digital technologies. It has become simple to use these social networking tools to facilitate cocktail parties, network with high school friends, chat with classmates, etc., but it is being done in a forum that is now being recorded, word for word, picture for picture, and may also be accessible to outsiders. I don't think we should underestimate the implications that this has for fundamental changes to our conception of privacy.

However, regardless of what bleeds over from the physical world into the digital, it doesn't change the fact that just because it is possible, does not mean that should be considered ethical. The ability to google a potential hire also opens the door for a manager to engage in discriminatory hiring practices. It allows the manager to ask questions of the candidate's background that would be unquestionably unethical to ask in a job interview itself. It is also highly problematic to require candidates to mask every personal element of their digital lives that could form the grounds of discriminatory hiring practices.

Beyond the issues of offering up information that ethically shouldn't be using to make hiring decisions, there are other questions that I have about using to google to evaluate potential hires. If a manager makes hiring decisions based on a medium that librarians, by and large, universally disparage as an unreliable source of information, then it calls into question the manager's core competencies as an employer. Does the manager have the skills to conduct a successful job search without resorting to sources of information that are not verifiable (such as facebook or myspace)? I used to work with a billing system where we had full access to all the personal information imaginable that the university retained, and we still regularly confused patrons and had difficulties establishing matching identities between the billing system and our patrons. I don't think that we were particularly incompetent, it's just that there were hundreds of thousands of people in their database. And if you expand that search process to the internet at large, the pool of potential hits expands into the millions.

Finally, it is also important to note that librarians are going to look hypocritical and ineffectual if they make a stand to protect the privacy of their patrons and ignore the privacy of their employees, even if they are only potential employees.

Tuesday, January 27, 2009

Strange Days

I found a journal article for sale on amazon.com today for the low, low price of $5.95:

http://www.amazon.com/ceiling-mounted-airborne-concentrations-building-TECHNICAL/dp/B0009738BW/ref=sr_1_16?ie=UTF8&s=books&qid=1233113596&sr=8-16

It's the first time I've seen an academic article for sale on amazon. I wonder if it's a viable paradigm. It seems quite odd to me.

Thursday, January 22, 2009

I knew there was something different here...



Somehow, after having spent most of my life in California, I never fail to be amazed at the stark gulf between the have's and the have-not's of that place. Today I was reading about how they built this statistical model of literacy rates:

http://nces.ed.gov/naal/estimates/

The interesting thing about the model is that you can compare rates of the minimal levels of literacy between counties and states across the country, and it will show you the rates as well as various confidence intervals.

So naturally, I compared the place that I was born with the place where I am now (using the data collected in 2003). The model estimated approximately 24% of the population in the county where I was born to be lacking in basic literacy skills. And now I live in a county where the model estimates only about 5% of the population to be lacking in basic functional literacy skills.

Why can't I shake the feeling that these statistics are deeply related to other problems of race and class?