The internet is changing. Everybody knows that, especially mathematicians and advanced users…and so do pigs. Specifically the way users interact with search engines has been changing in the last decade.
When Internet search engines used unbiased algorithms every user who searched with the same keywords, received the same result. This should be the way of searching for something. Knowledge is an absolute concept, or at least it should be.
Today, most search engines are using algorithms designed around the user who searches a customized way. Thus, different people get different results when they search for the same thing.
As Eli Pariser said in a recent talk “people receive things they want to see, not what they need to see”.
I was very impressed when I used two different accounts, my flatmate’s and mine, to carry out searches about distinct topics he regularly searches but I don’t. Google produced two lists of results for the two accounts which were quite different. Those differences were generated by re-positioning some websites in the first indexes for his search but not for mine.
I found this to be alarming and I had to have a break with some Panini before starting again.
Basically, this means that there is much less knowledge sharing and if we push this concept to the limit, people on the Internet will end up sharing their knowledge and searching in their own buckets in the long term.
Ok! Piggy has to explain this very deep concept (oo)
Imagine this scenario:
- x(t) represents the number of people who know something new (for example they publish websites that Google can crawl and make them available for potential searches made by other users)
- t is time of course
My assumptions are that
- If someone knows something, they don’t simply “forget” that thing (true, unless these users have serious problems that I will not discuss here)
- Someone can learn something she doesn’t know from someone else
One of the most commonly used models to explain this kind of dynamic (and many others like population dynamics) is the logistic curve proposed by Belgian mathematician and demographer Pierre-François Verhulst in 1825.
The differential equation that regulates the model is
Let’s rethink this in Internet terms.
- x(t) is the number of people who know something new a.k.a. those who contribute to the Google database by posting/creating new content
- R is the learning rate (for our purposes, I left it 1)
- k is the overall Internet population who makes use of this search engine (I can leave it as a parameter)
The maximum of x'(t) occurs when x(t) = k/2, i.e. the number of people who contribute is half the Internet population (this keeps knowledge sharing at the maximum rate).
But filtering the search and using an algorithm that gives results based on other factors such as the user’s private interests, one’s search history or the interests of one’s friends, irrespective of x(t), will most certainly reduce the number of contributors x(t).
That is, it will most certainly decrease the rate of knowledge sharing.
Personalized search might be useful if and only if the majority of the results come from an unbiased algorithm.
But is this really the case?