This Blog is named The Information Age and that isn’t a random event. Information is at the core of our time, regarded as a new form of wealth. I have been posting about Artificial Intelligence (AI) or machine/deep learning for a while recently here, but this blog will for the next few weeks shift its focus to information issues proper. AI, or intelligence issues for the matter, are forms of information organization. But information has other important topics to check, research and discuss about. For example, the field of Data Mining is one of the most important ones for Data Science. Data Mining is sophisticated form of information retrieval.
Form here we could also think about the theoretical topic of Information Theory. This blog has already posted on this. The topic of Quantum Computing/Information has also been one prefered topic here, here and here. But the next posts will focus on Data Mining or issues in information retrieval. Recently I found an interesting ordered list of videos on YouTube from a course on Data Mining – Stanford Statistical Data Mining. The videos are from 2007, but the actuality of what is covered isn’t much out of date (a bit maybe…). Nevertheless I thought it to be appropriate as a good educational/pedagogical set of posts for the keen audience of this Blog. It is also for me important, as I outlined from the beginning, this site is a sort of mix between research review/interpretation of cutting-edge topics as well as my own way of keeping the pace of developments.
But the video I will share and comment today deals mostly with issues in information fostering/retrieval. This topic is of interest in many applications, business ones being the most talked about. The speaker in the video begins to make important distinction in information retrieval/seeking: on the one hand we may be approaching information retrieval from point of view where we have a need and go about search for the information to address that need; on the other hand, the other side of the coin of information retrieval is when we actually do not know from the outset that we have a need. Indeed the presenter in the video talks about this perspective as a dark side of information seeking/retrieval. This introduce us to the issue of risk in the process of retrieval. What I think we should understand here is that this is another classical trade-off between risk and reward, but one where there is another element mixed in – the potentially highly innovative character of what is being done. Modern Data Science is an intensely cutting-edge innovative computing activity, so the potential for discovering previously unseen, unexpected information that leads to new discoveries or new theories is a major plus.
In the video from Microsoft Research, Chirag Shah from Rutger’s University Computer Science Department, after introducing the topic he will talk about, begins by formulating several bullet points around the issue of retrieval. One of these points caught my attention was the question of people don’t know what they don’t now. It seems like a tautological argument, but it is not; it may point to a deeper direction of philosophical inquiry about the relationship between information and knowledge. We know that information is not the same as knowledge, often enough our information intensive societies spend quite a lot of time (with the related costs, such as the new plague of fake news on social media bears witness…) trying to disentangle that obvious fact. What this point to is to the broader distinction between common sense (basic facts about reality we all share with one another) and common knowledge, that is, when we have more formal ways to ascertain about what we know is also known by other minds. Common knowledge is hard, and a potential misleading problem. Improved information retrieval/seeking, of the kind exposed in the video, might be of help in avoiding those problems.
One aspect in the talk that is especially relevant concerns exploratory search. In this Dr. Shah explicitly mentions the importance of process and process learning as opposed to only evaluate the result of a retrieval/seeking effort. Understanding of process is quite similar to understanding a pattern recognition classification problem, and this isn’t surprising, giving what we now know about the interplay between data science and machine learning. The meritocratic/creative nature of the process is also emphasized in this section of the talk by the mention to the greater credit attributed to difficult to find information.
The next part of the talk dealt with collaboration in information seeking. It is rightly pointed put that collaboration isn’t always a good thing to do. It is intuitive, but not always the baseline approach. But the complexity of events such as an oil spilling, where there might emerge conflicting workloads within lager dataset queries was a good case mentioned.
The final part of the talk was about information foraging/seeking behavior and its relation with Big Data. Here the interesting part concerns the data collection process, the exploration/exploitation trade-off and how to uncover the relationship between human or even animal foraging and the automated/computing process used to discover novel information of interest; of notice is also the correlations between the different unique distributions of patterns in the foraging and how this relates with inter-query diversity.
The main message of this talk and its title rightly alludes to is that being pro-active in information seeking is a big payoff in our big information age. Timing is of essence, so Dr. Shah comment in the final wrap up reminds us.