Vihaan

❯

❯

How search engines work

How search engines work

to-learn
technology
unfinished

Theory

Crawler + Index + Algorithm

indexer searcher stemmer ranker

inverted index?

Building a personal search engine

Examples

https://github.com/thesephist/monocle?tab=readme-ov-file#monocle- Data in the form of modules → tokenizer → indexer (inverted index) Query → tokenizer → stemming expansion → search → rank (tf-idf)

https://www.youtube.com/watch?v=PWTPSukXeIg https://github.com/siddhantdubey/Sidgrep?tab=readme-ov-file Youtube transcript data → OpenAPI embeddings → Pinecone API for storage Search → OpenAPI embeddings → Pinecone API gives results https://www.youtube.com/watch?v=UUnAcrzA0nA https://github.com/thesephist/ycvibecheck/tree/main https://marketbrew.ai/understanding-query-parsers-how-search-engines-process-your-searches

Crawler

https://michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/ https://jsoup.org/ https://github.com/yasserg/crawler4j

React and Next

https://react.dev/ https://nextjs.org/

https://www.pinecone.io/learn/series/nlp/dense-vector-embeddings-nlp/ https://jamescalam.medium.com/free-course-on-vector-similarity-search-and-faiss-9b3e91a91384 https://www.webfx.com/blog/internet/what-is-a-web-crawler/ https://www.youtube.com/watch?v=7RF03_WQJpQ&t=203s

Graph View

Theory
Building a personal search engine

Created with Quartz v4.4.0 © 2025