Reading the post of Shane Legg “unreasonable-effectiveness” and reading the related article “The Unreasonable Effectiveness of Data” by Alon Halevy, Peter Norvig and Fernando Pereira I have the pleasure to see how the attention of researchers are moving in the investingation of huge data set as an important resource to solve problems.
I have a different opinion in the statement “unreasonable” , I think it is an inevitable consequence of the theoretical hypothesis emerging from empirical evidence .
In the article Norvig speak about text-translations , text – comprehension, web 2.0 , etc… I claim that huge sets of good data ( 1 terabyte of “0” is useless , which data is good ? A first step here ) is inevitably a good resource to speed up every “difficult” problem . In the previous post I show how the Solomonoff universal distribution change over a limit of available resource M and in this example is possible to see how important is to know the set of existing Data . The knowledge of a huge data set give the possibility to increase the knowledge of R and this let to know a General Real Distribution , a distribution in the real world , a sub-Universal distribution but more simple to compute ( for the difference from exponential behaviour see The Shannon Discrepance 1 2 ) and correct for real-problems.