Rashik Parmar, President, IBM Academy of Technology
It is estimated there are approximately four zettabytes (1021) of data today and this total is increasing at 50 percent per annum. This growth is driven by the proliferation of digital devices and technologies such as sensors, embedded processors and even security cameras. Data is being gathered and analyzed at an unprecedented pace in an attempt to find new ways to optimize or increase the effectiveness of systems such as healthcare, transportation and energy.
Just stop to consider these examples around us: the typical airplane engine has over a 1,000 sensors to constantly monitor performance; three billion smart phones are being used by individuals to instantly share their world with family and friends; billions of RFID tags are used to track everything from the movement of goods to the performance of athletes; the sensors from the typical Formula 1 car will create 25 megabytes of data in one lap; over 100 hours of video is uploaded to YouTube every minute. We are now entering a world in which everyone, everything and every organization will be constantly creating data.
You might wonder, with so much data available, why is there still so much waste and inefficiency in all aspects of our daily lives? Congested roadways annually cost the US economy US$78 Bn, from 4.2 billion lost working hours and 2.9 Bn gallons of wasted petrol—and that’s not counting the impact on air quality. Inefficient supply chains cost US$40 Bn annually in lost productivity—more than 3 percent of total sales.
Our healthcare system really isn’t a ‘system’. It fails to link diagnoses, drug delivery, healthcare providers, insurers and patients—at the same time as costs spiral out of control, threatening both individuals and institutions. One in five people living today lack safe drinking water, and we’ve seen what happened to our financial markets, a system in which institutions were able to spread risk, but not track it.
The challenge is that data alone is of little value. Only when raw data is combined and contextualized will it create valuable information and help provide answers. The challenge ahead is how to derive value from the data in a timely and affordable manner. Advances in cognitive computing technologies are already demonstrating that we will soon be able to address many of society’s challenges.
So what is cognitive computing?
Cognitive computing refers to a new breed of computer systems that can learn for themselves rather than needing to be programmed. Attempts to create computer systems that are able to think go back as far as the 1950s, when Alan Turing defined his famous test to assess if the conversational capabilities of a computer system could be indistinguishable from a real human being.
Advances in programming techniques through languages such as Prolog and Lisp in the 1980s showed promise in being able to describe complex logic and allow computer systems to derive results that were not explicitly programmed. In parallel, advances in neuro-linguistic programming have provided modeling techniques that allow advances in text analytics and natural language processing. However the breakthrough came in 2010 when, using the US game show Jeopardy as a stage, IBM’s DeepQA system (AKA Watson) was able to beat the world champions in a quick-fire, general knowledge quiz.
DeepQA is a computer system that can directly and precisely answer natural language questions over an open and broad range of knowledge. It is different from traditional computing systems that are not programmed to answer a specific set of pre-defined questions. The DeepQA system is able to dissect any question and identify a series of possible responses from a pool of structured or unstructured data. Using a complex technique of evidence scoring, the system ranks the possible answers and selects the most appropriate responses. Today, DeepQA has advanced question-answering technology to a point where it now clearly and consistently rivals the best human performance.
Applying DeepQA techniques opens a wide variety of opportunities to assist knowledge workers in making complex decisions. Early trials with doctors have already demonstrated how it can help improve the treatment of patient conditions. For example, in 15 seconds the system is able to cross-reference the records of a million cancer patients to help assist in identifying the right care plan.
However, the DeepQA system can only be as good as the information available to it, so creating a comprehensive contextual repository represents both a computing and networking challenge. Data from the vast range of sources needs to be collated and understood. At the same time anomalies need to be identified and assessed to determine their cause. Next, contextual information needs to be created so that DeepQA systems have the evidence to support their answers. Early analysis has shown that for every byte of unstructured text, 10 bytes of metadata and a further 100 bytes of relationship data need to be created. Moreover 1000 bytes are needed to create context and meaning in order to mimic human understanding.
Being able to move data to a place where it can be of value is one of the fundamental reasons for data networks to exist. With the advent of the Internet and standards it is much simpler to make use of data for purposes previously thought of as impossible or unimaginable.