BigBanyanTree: Enriching WARC Data With IP Information from MaxMind You can gain a lot of insights by enriching CommonCrawl WARC files with geolocation data from MaxMind. Learn how to do that using Apache Spark!
Zero to Spark - BigBanyanTree Cluster Setup This blog details the setup of an Apache Spark cluster in standalone mode for data engineering using Docker compose on a dedicated Hetzner server. It also covers the setup of other utilities such as Jupyterlab and Llama-3.1 8B LLM service.
PharmAssist AI: Revolutionizing Pharmaceutical Research with AI and RAG on FDA Open Medicine Dataset - Part 1
Guide to using OpenAI Assistant API Discover how OpenAI's Assistant API can transform your applications with intelligent virtual assistants. Explore tools and integrations that elevate user interactions effortlessly.
Building a Local Arxiv Paper Search Engine Discover how our local search engine for academic papers streamlines research, providing quick and relevant results through advanced data processing and semantic search!
If you had the chance to go back in time, what would you do differently about your data science journey?