I'm one of those people who has about 9 different email accounts at any given time, one of which is gmail. Yesterday I discovered about 50 legitimate emails had been diverted into my spambox this week, for no apparent reason. I'm guessing that there was some kind of spam-bomb that went off and pushed their filters way up the scale for a day or two. Up until this point it's been one of the most reliable and efficient spam filter's I've used.
As a result I've been thinking about Google a lot the past few days. There's the new Verizon Motorola Android phone on the market. There's the Google Books settlement that just came out:
And an interview with their CEO talking about Google's successes and challenges:
I think the most interesting aspect of this interview is the problem of scale that they are up against. I've been noticing recently that the quality of searches have started to suffer, as they struggle (or fail) to keep up with the new media and blogosphere.
Once again the scale and structure of the internet is changing, and they may have to revisit a number of the assumptions that they build their search engine upon. Which leads me to the question of "how do you plan for fluidity?" How do you plan for a system that is dynamically alive and changing at a pace that only seems to accelerate?
I don't know if anyone has a really good answer for these questions, but I do think the world of computers has some solutions in the works, namely, what I like to call abstraction.
When I was learning to write a simple web application in PHP last spring, the first thing I did was write a bunch of 'classes' that would define objects. Those classes called the database and fetched the data, handling it in the terms defined. All of the PHP I wrote after that called on the classes, calling these 'objects' rather than calling directly to the database.
The beauty and elegance of this is that (in theory) you can change the structure of the database radically, and you only need to change the affected classes. The rest of the code can run, virtually unchanged, on those modified classes. I like to think of it as a type of data abstraction, where the code calls the database via an abstracted, a mediated channel, rather than the database itself.
This is, however, not unique to PHP and web applications. As I understand it, all object oriented programming languages function similarly, defining and calling objects, whether the definition is called a class or a library.
This is, again, not unique to object oriented programming. All of the Linux machines I've used rely on what's called a 'hardware abstraction layer'. Basically the operating system calls this abstraction layer and the abstraction layer acts to communicate with the hardware. One of the big problems with talking to the hardware directly is that if it fails to respond, the system hangs, freezes or crashes. So rather than writing a specific response to every type of possible failure, they rely on the hardware abstraction layer (HAL) which will communicate back to the operating system if the hardware fails to respond or perform normally. Furthermore, if the hardware changes, the operating system doesn't have to change the way it calls the hardware. In my opinion this is one of the main reasons that Linux has made it it out of the geek pit and into the playing field. Prior to this Linux was synonymous with 'hardware configuration nightmares'.
Which leads us back to the question, are there new ways that we can abstract our library functions, our information systems, such that the structures can change without having to create whole new systems? How will we afford flexibility even in our abstractions such that they too can change?
One library geek, two hands, three sock puppets, four hours of sleep, five books, and a six megabit internet connection.
Sunday, November 15, 2009
Saturday, November 7, 2009
Standing in the Temple of Interoperability and Extensibility
My research group likes to share links of interesting articles. This week the basket included:
Think Tank Stresses Importance of Information Sharing in Research and Teaching
Tim Berners-Lee: Machine-readable Web still a ways off
All of the links this week really get at the heart of a widespread need for interoperability, extensibility, and some standards for machine readable contextuality.
I find it both really fascinating and totally counterintertuitive that standards (when used properly) promote the creative (and unpredictable) expansion of the net, as they allow interoperability, sharing, and reduce the duplication of effort. I don't think that most people understand the degree to which they provide the substrate for the network to communicate. Because the network is so distributed and largely uncontrolled, it provides the power to aggregate across systems. That is, however, the problem as well.
However, in the history of solutions in this arena, from html to packet switching to email, you see a pattern of innovation that is both simple, efficient and elegant. The need for semantic structure in the web is building, across institutions and across disciplines, and I'm willing to bet that we'll see a solution start to take hold in the next few years. The pressure is building, and the dam will break. As such, I would argue that we are in one of those important historical moments where change is about to form itself before our eyes.
Since I've had too much coffee and not enough procrastination, I'd like to indulge in theorizing about what innovation matrix may come out to solve this problem. I say matrix because it's such a complex systems and sociological problem. I don't think any one technology or innovation is going to get us from here to wholesale adoption of something like the RDF.
There are, however, some assumptions to predicting evolution. We have to assume that the nature of the system won't radically change, and that the assumptions that have given rise to the current system are not fundamentally flawed.
That being said, there are regularities to technological change. The innovation has to be simple enough to be distributed and elegant enough to be adopted. It has to be something that's obvious from the technology in hindsight. I see the following principles implicated:
Because of these factores, I don't think a centralized repository of metadata (semantic) information is feasible. The distributed, layperson-oriented and uncontrolled nature of the internet precludes a number of avenues of innovation.
However, I think the solutions will look mundane and yet powerful. Below are som that might pass through most of these limiting factors.
1. Embedding more semantic/metadata information within the hyperlinking html definition.
Why this would work:
It would only begin to address the problems of non-html documents on the web. In the end, the metadata really needs to be emdedded in the document such that if you have the document you have the metadata.
2. Creating a standardized file wrapper that embeds the metadata. Just like the embedded metadata of mp3s, and the wrapping of digital video files to encompass multiple encoding types within a single file type. These two solutions could be extended to the entire world of online files.
This would require a bunch of details to be successful:
And I'm sure this hypothesizing shows my gaps of knowledge as much as anything. But you have to admit that it's an intriguing issue to think about, regardless of how things fall out. And while this may not be the solution that takes hold, it is likely that it will be something in this vein.
