Publication at ACM Digital Library
The WebEngine: A Fully Integrated, Decentralised Web Search Engine
Proceedings of the "NLPIR 2018 - 2nd International Conference on Natural Language Processing and Information Retrieval" in Bangkok/Thailand, September 07 - 09, 2018:
This paper presents a basic, new concept for decentralized web search which addresses major shortcomings of current web search engines. Its methods are characterised by their local working principles, making it possible to employ them on diverse hardware configurations. The concept's implementation in form of an interactive, librarian-inspired peer-to-peer software client, called 'WebEngine', is elaborated on in detail. This software extends and interconnects common web servers creating and forming a decentralised web search system on top of the existing web structure while -for the first time- combining modern text analysis techniques with novel and efficient search functions as well as approaches for the semantically induced P2P-network construction and its exible management. This way, an alternative, fully integrated and powerful web search engine under the motto 'The Web is its own search engine.' is built making the web searchable without any central authority.
Try out the WebEngine-prototype
Text-representing Centroid Terms
After only a few lines of reading human readers are able to determine, which category of texts and which abstract topic category given documents belongs to. This strongly demonstrates how well and fast the human brain, especially the human cortex, can process and interpret data. It is not only able to understand the meaning of single words - as representations of real-world entities - but of certain compositions of them, too. In addition, the brain acts as a knowledge data base when topically classifying content not seen before. It tries to match the terms (i.e. words carrying meaning) in such documents with previously learnt terminology and can, in doing so, instantly and unconsciously classify them at least coarsely.
Text-representing centroid terms represent a completely new method and technology inspired from physics and processes in brain to support these tasks in a better way than all conventional approaches mostly based on bag-of-words or term frequency – inverse document frequency (TF-IDF).