Search This Blog

Saturday, February 26, 2011

Fun but Fascinating Ramblings

Whatever the type of digitizing, the process depends on a translation of form. This has very interesting social and practical implications beyond the sizable concerns associated with information loss and the uncertainties of digital format longevity. The act of shifting, removing, or abstracting the physical properties of an document distorts meaning. It also enhances meaning. In a way, value is neither destroyed or enhanced, it is just shifted based on cultural practices. It is important to meditate on a few of these points, if only briefly and idly.

If we think of documents as more easily transportable and accessible idea containers, then digitization is merely a method of advancing and augmenting these properties in the digital world. The loss of information and context that results from form distortion, by digitization, results from an imperfect conversion. To me, this is not different from the distortions innate in manifesting pure idea into a socially digestible form via spoken language. It may be arbitrary, but for the case I am making, it is helpful to think of physical forms a containers. Each document is a vessel for ideas, and these ideas can be manifest in a number of ways. But there really is no platonic and perfect from of the idea that exists outside of its physical form. The form adds and diminishes meaning, but, more importantly physical form enables the idea to live in a social environment. The form grants social dimension to the idea.

From a archival perspective, as much as from a IT perspective, the true value of an idea arises from its social and contextual significance. Value accretes as it moves though communities of use and consideration and is linked to other intellectual and physical developments. Context is key. An artifact in a museum setting is decontextualized. This is true whether the curator desires it so or not. Context has to be artificially built around decontextualized documents. This process could reflect an intention to restore context or to construct novel context.

Context emerges through associative information. This information could live in documents and experiences in near physical and intellectual proximity, but not necessarily. It is, however, always outside the and surrounding the document itself. The problems of value and scarcity (both integral to market economies) in the digital world are good examples. Copyright law and ideas of ownership and possession are confounded in the virtual world where nearly anything can exist anywhere at the same time.

Two ways of representing associative information, that come to mind here, are pastiche and palimpsest. Pastiche is a type of idea destruction and recreation. Its often now called remix, or mashup. It provides for the creation of new forms through the appropriation and decontextualizaion of prior forms. Many argue, reasonably, that all human creation is pastiche. In essence, the argument goes, we are all just absorbing ideas that existed before and around us, and through the application of our personality and experiences, combining them under the guise of innovation.

Palimpsest is a type of meaning that accumulates over time. It signifies a cycle of writing, erasing and rewriting. Though, with each new iteration, the old is simultaneously obscured and reincorporated into the new. Think of a once vibrant and vital industrial park that falls on hard times, decays, lies fallow and unused, and then is rehabilitated into trendy shops or lofts. The process tells us more about a city and society that any one instantiation or roll of the physical place and its arrangement.
Metadata is the primary, thought not the only, reflection of the above discussed associative information in digital and digitized documents. It describes a thing so that the thing can be found and used. Sometimes metadata proscribes, other times it prescribes use and meaning. It is always constructed. Thought, often, it is constructed collaboratively, and over time, like the meaning in which the object itself resides.

Thursday, February 10, 2011

A few thoughts on Copyright

I experienced a copyright synchronicity today. I am studying copyright culture and policy in another class.  It was good to have a new context and variant perspectives added to a discourse I am otherwise steeped in.  The literature is amazing, insightful, and not at all boring.  I will defer to the professional writers on the topic for analysis.  But, a few contradictions have been piquing at my interest. They are more likely elucidative of human nature that they are of copyright.  Regardless, here they are.

Most original American musical art forms rely on cultures of remixing and revision.  Some of the revision occurred with attribution and permission.  Most did not. Could Jazz happen today under current copyright law?

The copyright holding industries have made a great deal of profit from the use of materials contracted from creators.  They also present themselves to be the representatives and defenders of these creators. I could argue that copyright law has become unwieldy and unkind to artistic creations.  It  makes it more difficult to create. Yet, major copyright holders push for more and more stringent regulation and enforcement.

Most of the industry lobbied to strengthen copyright, form its origins, made a great deal of profit from its laxity at the time that they were emerging.  Thomas Edison, for instance was a great inventor and passionate defender of patent and copyright, but he was also a flagrant copyright/patent thief:
http://uk.answers.yahoo.com/question/index?qid=20090808045443AAOmUNs



 

     

Discussion Thread

So, Carlos deftly sidestepped the concept of balance as a rationale for copyright.  I say deftly because it feels (to me) naive and outmoded to conduct a useful analysis of general copyright policy under the consideration of interest balancing. I believe this for two reasons. One is the general inertia of the legislature and the judiciary to further limit and constrain the interest of end users, particularly in regard to digital materials, while expanding the coverage and terms of copyright.  Second, is the fact that for balance to occur, you need parts wtih opposing interests to seek compromise. Mostly, I'm not sure the general public has genuine representation.

Balance, however is essentual to copyright exceptions like fair use, which are in turn, essential to non-profit digitization and online access projects.  I believe fair use was developed to accommodate libraries and other cultural institutions in the early days of copyright legislation.  My concern is, that with the general government trends towards austerity, and the dire consequences for cultural institutions, the historical expansion of copyright, the lobby power of the media industries, and (arguably) public indifference, is fair use becoming an artifact of the past?   What does this mean for preservation?  Do we need a reaffirmed, more clearly described statutory version of fair use? If so, where will this push come from and with sort of coalition will it be?  Or, would eliminating the vagaries surrounding fair use be detrimental to people with a more hesitant approach to copyright expansion and encroachment?

Thursday, February 3, 2011

ABBYY

If I could enumerate my first OCR ABBYY experience in the tone of a survival log, it would go something like this:


Day 1: Today I awoke, confused and not without a vague fear, on some alien beach. Though, I survived the crash. I guess I should count myself fortunate. While I am destitute and lost, I am ALIVE. Though what kind of life awaits?
Day 2:  Upon initial survey of my beach (as I have come to call it) and its close environs, my spirits are much improved.  This place is a variable Eden and if I am to be a lone Adam, at least I will be just as well nourished.  Tomorrow I will build a shelter and put my claim on this place in earnest.   
Day 3: Disaster! The fruit I collected is poisonous.  I am double wrecked.  On top of these misfortunes, my shelter is slow in coming and storm is on the horizon. 
Day 4:  ...Why me!? All is lost...

Maybe that was a bit dramatic.  Anyways, here's my evaluation of my first experience OCRing  about 70 typed pages.  It started off pretty good, running thought the pages I had previously scanned, using the spell checker to review and adjust the low confidence characters.  Then I realized that ABBYY assigned different classes to features of a scanned objects. So, ABBYY will sometimes see a header or a title and assign it a "title text" value and then assign the body a "body text" value.  Fine.  Except that text considered "title" is ascribed a value that makes it UNMOVABLE.  I am sure that there is a perfectly good reason for this...no, no, its really stupid.  Why would anyone, ever, want static text on a computer? Plus, its inconsistent, some pages it will identify a title text box and on others it will treat the entire document as body text.

Words are hardly adiquate to express my frustration after I completed OCRing a 37 pp document only to discover that the "title" text was completely useless.  When converting the text to a word document, ABBYY passes the "title" text's opaque and untouchable (to me) properties over.  This has dire effects on page and text position.  Sometimes body text would be superimposed over the title text.  So I had to go back and repeat about an hour of work to hack the document into editable form. I was also unsuccessful at finding a way to default all character types to "body" to automate, and expedite the process.  

I know most of my gripes grow out of my ignorance of and inexperience with the software.  I will say, though, that even when I did find a good routine for processing the text, I still felt like I was hacking the software--fighting with it to make it do what I wanted. Not intuitive.  Of course, when I say hacking, I mean in the traditional sense of the word that implies curious, healthy exploration of technology, not the sense that people that drink deeply of the fox news cool-aid would understand.  I love to hack, but not when I have a pile of menial labor to grind through.  I felt like an operator removed, like I was driving ABBY from the back seat with broom sticks and mirrors.

Not fun...

The transcripts however...  What wonderful tails of oil conquest, labor issues, drunk dogs, Howard Hughes and snakes.  Reading the transcripts, I felt a distant connection with my departed paternal grandparents.  They were not involved in the oil industry, but lived in Amarillo.  Their reason, sensibility, practicality, ingenuity was detectable  in the rhythm and spirit of speech of the various transcribed conversations.  The experience brought me face to face with importance of this work--albeit a personal reason.  So, ABBY be damned, I will soldier on.

Reading Reflections week 1-2

The dialog surrounding the myriad problems and uncertainties facing preservation and preservation technologies in a mostly digital (created, disseminated and stored) environment was interesting.  Particularly discussions by the Cohen and Rosenzweig, and the Conway articles. 

Thought they addressed some of these issues and touched on others, and granting that the articles are ancient in computer years, I though that the major digitization issues were largely glanced.   I found myself puzzled at the focus on preservation modes and formats.  Does it really make sense to think about computers as discrete units?  In the same sense, does digital preservation mean finding a stable media platform?  The strengths of the internet reside in its distributed and dispersed nature.  Stability is a antiquated notion.  Inversely, I think the shifting, dynamic and colaberative aspects of the internet pose more opportunities for preservation than risks.  The focus should be on collaborative projects, distributed automation, like the seti@home or folding@ home projects, and LOCKSS, and most of all, an affirmative shift to multiple open standards and frameworks.  What do you think sirs?

Tuesday, February 1, 2011

My History with Digits

I'll preface by repeating something I once read: "hypocrisy is a tribute that vice pays to virtue." If so, maybe blogging is the tribute that vanity pays to public expression. As few would find my strengths in any of those things, I beg indulgence and patience in this my first(!) blog.

Describe my history with digitization? Okay, easy enough. For the purposes of this exercise, Quinn defines digitization as, “...the process of taking things that are accessible in the physical world, like writings on paper, pictures on film, sounds on a radio, video from a television, and making them accessible in the digital...”. It is a definition mindfully tailored for this assignment, which is both accessible and simple without being simplistic. And I would be bereft to think of one better. But phrased like that, it becomes complex to survey my history with digitization. Not so easy after all.

To me this definition hinges on two factors: conversion and access. Digitization is the act of converting or shifting one format into another and then, more significantly, making the object accessible. When did I start doing this? Just considering this question perplexes my memory. Disposable simplifications and absurd aimless abstractions aside (though, what else are blogs for) I'm not sure I can determine a point in my life when I wasn’t digitizing or being digitized. Even if I focus literally on binary to bolder type situations, a history is difficult to describe.

For the purpose of the exercise, I'm guessing a I should address the question with a measurable benchmark. Okay, so, in High School I built a computer. I did it to play games. Back then you could build a fast computer at a third of the price of what you could buy prebuilt. It was an agonizing experience and one I didn't actually enjoy all that much. After all, it was an means to an end: the games. Unfortunately, having the determination to throw some keyed parts together, after hours of trial and error, (I blame legos) gave people the mistaken impression that I was a “computer guy.” And from that point forward, I have been working with computers on a technical level. Many of the issues people would enlist me for related to scanning or format conversion. I've done a fair share scanning solely in the pursuit of teaching. I love computers now. And not just for games or spell check. 

Though, this is merely the “college application” version of my digital experience. As a snapshot, it is blurry and trite and wants elaboration.

My earliest memories involve all sorts of now lost “computers” like the family PC, my NES or my awesome Casio digital watch. I'm guessing this experience is common for anyone under 35. I can still hear my speak n' spell rewarding my clumsy finger presses with its charming electronic drone. Maybe some cynic would argue that this has nothing to do with digitization. After all, there wasn't an output—a converted object that is accessible and interactive. I would reply, in a self-couscous, half-ironic yet defiant tone, that I was the output.

Time with these machines shaped my behavior, personality and understanding of how the world sometimes worked. How it SHOULD work. The physical and the digital were interwoven in the tapestry of my life. They still are. The ability to draw a line of demarcation between them, to be able to say “this is digital and this is physical” is a model that I adopted to better communicate with adults. I firmly believe that few things now are pure and untainted by a digital-physical seepage. I know because I am a product of this interdependency.

I believe that computer networks are re-socializing society. My young self's naive notions of the natural interlocking features of the digital and physical worlds are playing out very dramatically now. Both in the possible dusk of the internet “as-we-know-it” and the dawn of a new “digital” library. The carbon paper to compressed file, paper to petabyte, (etc) type of digitization is an integral part of this ecosystem.