Sunday, May 16, 2010

Neural Networks - Is Google Invading the Space

How big is the Web? The last numbers I saw (Feb 2010) estimated 750 million websites worldwide, plus 200 million blogs. There are surely other domains which Google scans. Other figures suggest 25 billion indexed webpages (Netcraft March 2009), but that number will have grown much since then. In this context, we take 'Google' to be a combination of a search engine and an instantaneous results set across all web site and blog resources worldwide.

Here, I use the term neural network not in the strict Artificial Intelligence sense, but in a more general sense.


look at a simple model of the human brain. It has a set of data inputs (visual, auditory, chemical - taste and smell, pressure - touch, thermal, inertial - the ear canals) and a memory structure. Data input is captured in short term memory. Brain processing adds context making it information, then sorts and filters it, and so moves it to long term memory. This move to long term memory is thought to happen during sleep and dreaming.

The short and long term memory is in the form of junctions between brain cells called synapses. More input on a given memory strengthens the relevant synapses - that is repeated exposure to a given input strengthens the particular memory. for instance, the more we taste bananas, then the easier it's to 'recreate' the taste in our minds.

We know that as we age, the more salient memories (stronger synapses from earlier in our lives) are easier to retrieve, short term memory becomes less efficient and recent (but long term) memories are hard to retrieve.

Sorry, where was I? Ah yes, I remember now.

Our ability to build new synapses falls off with age in most folks, and once we strive maturity we are unable to grow another brain cells. Autonomic responses (e.g. breathing) use 'difficult wired' memory in the hypothalamus which is a very primitive part of the brain structure which may be equated to 'read-only' memory in a computer.

So, look at Google to have a set of data inputs - primarily the bot/crawler data gathering, but also input about the 'popularity' of web pages as gathered through use of its search engine by users. The data from these bots about a given web page - for instance keyword relevance of content, the number of external links to the page and then on, is converted into Google's proprietary and secret page rank scores and allows a 'salience' for the analogous or proxy Google synapse. The analogous Google synapse is simply (I assume as I am not privy to Google's design) a database row for the website/page with the aforementioned data items (including the page rank/scoring factors) in the columns, site map entries and site refresh rate and search history information. Plus, no doubt, lots more as they're avid data gatherers.

surely the analogy with the human brain breaks down with time, as we wouldn't expect the Google model to suffer from a capacity limitation or by a constraint imposed by 'technology' (as happens with the brain when we age and the synapse building processes become less efficient).

So, what use is this analogy to us? Well, look at how we might wish to add to human brain capacity and extend its efficiency - we are getting into William Gibson territory now (he was the author who invented the term 'cyberspace'). Why plug another memory chips into the brain, when all that is needed is a connection to Google? Science fiction? I do not think it's that far away (less than 50 years). The potential social consequences are quite frightening to look at.

(c) 2010 Phil Marks

No comments: