I come back today to the definition series of posts from the glossary on computing, computer science and data science subjects.website WhatIs.com. I thought it timely to post this definition, as it is about an issue that pertains to the proper and competent data engineering/science practice often vented by The Information Age.
Indeed I posted before for instance on the subject of smart data in the context of big data, self-driving cars and deep learning. So this is an addition to the notion of smart data. We are in the beginnings of a digital trend that encompasses the implementation of digital technologies, interfaces and the paradigm of connected devices with the Internet (Internet of Things and Industrial Internet of Things), that promises a wide range of opportunities for businesses and institutions of any kind. But the needs and demands to proper management of the real fuel of all these industries, which is data (structured or unstructured) will only increase. This calls for the skill set around data management to be in a constant flux of updates.
This entry in the website WhatIs.com is in the above spirit of helping the reader to check and update his/her knowledge about a data topic. Smart data are data that by itself is already a form of automated information. The devices of tomorrow, and some of today’s, will all need not just more data, but also smarter data. Let us elucidate ourselves about this:
Smart data is digital information that is formatted so it can be acted upon at the collection point before being sent to a downstream analytics platform for further data consolidation and analytics. The term smart data is often associated with the Internet of Things (IoT) and the data that smart sensors embedded in physical objects produce.
The label smart is directly related to a data entry point being intelligent enough to make some types of decisions on incoming data immediately, without requiring processing power from a centralized system. In the past, most analytics was done with batch processing. Data was collected according to schedule, converted to a desired state, put into a database and processed on an hourly, overnight or weekly basis. A drawback of this approach is that by the time the data is analyzed, it’s already old. In contrast, smart data analytics programming (also called streaming analytics) monitors data at the source, captures events that are exceptions, assesses them, makes a decision and shares the output — all within a specific window of time consisting of seconds or fractions of a second.
A self-driving car, for example, can’t afford to wait for data to be sent up to the cloud and output to be sent back. It requires data gathered through sensors to be smart, so the data can be immediately analyzed by the automobile’s processors and outputs can immediately be sent to actuators that control the car’s brakes and steering wheel. If the data is not in a form that can be analyzed as soon as processors receive it, the consequences can be deadly.
Data scientists, business analysts, IT managers, marketing professionals and manufacturers are also experimenting with how to use edge computing and smart data devices to bring in more revenue, improve decision-making processes and spot problems before equipment fails.
One striking feature when reading this entry is the potential ethical implication of smart data in the self-driving car application. Indeed the car industry has been forcefully pushing for a speed up in the realization of self-driving cars, but the reality is that up and until the data feeding the car is not smart enough, and real-time enough, no one in her perfect mind will conclude that self-driving cars are safe and ready for mainstream adoption. As it is rightly pointed, the technology around data analytics programming and streaming analytics might have as yet a long way to improve to provide the car industry with the guarantees it needs for the full reality of self-driving cars to happen.
featured image: Digital Technologies Website