Sign in Subscribe

Suchit G

Suchit G

I am a CS sophomore and a builder learning new things! I love to talk about Linux, Deep Learning, and math ;) You can reach out to me here: https://suchitg04.github.io/

CommonCrawl on Spark: Reliably Processing 28TB of Data (6300 WARCs, 7 CC Dumps) on a Spark Cluster

CommonCrawl on Spark: Reliably Processing 28TB of Data (6300 WARCs, 7 CC Dumps) on a Spark Cluster

Reliably crunch TBs of data with a solid error-tolerant system on Apache Spark!

BigBanyanTree: Enriching WARC Data With IP Information from MaxMind

BigBanyanTree: Enriching WARC Data With IP Information from MaxMind

You can gain a lot of insights by enriching CommonCrawl WARC files with geolocation data from MaxMind. Learn how to do that using Apache Spark!

Serializability in Spark: Using Non-Serializable Objects in Spark Transformations

Serializability in Spark: Using Non-Serializable Objects in Spark Transformations

Discover strategies to effectively harness Spark's distributed computing power when working with third-party or custom library objects that aren't serializable.

Building ScriptScope Part 1: Extracting Top Used JS Libraries from Common Crawl using Apache Spark

Building ScriptScope Part 1: Extracting Top Used JS Libraries from Common Crawl using Apache Spark

Learn how you can build website technology analysis tools like builtWith and Wappalyzer! In this blog, we identify the top-used JS libraries for 2024!

Generating Synthetic Text2SQL Instruction Dataset to Fine-tune Code LLMs

Generating Synthetic Text2SQL Instruction Dataset to Fine-tune Code LLMs

Creating text2SQL data with defined roles, and sub-topics guiding natural language question generation using GPT-4.

Multi-Doc RAG: Leverage LangChain to Query and Compare 10K Reports

Embark on a deep dive into RAG as we explore QnA over multiple documents, and the fusion of cutting-edge LLMs and LangChain. Learn how LangChain works along the way!

Exploring Agents: Get started by Creating Your Own Data Analysis Agent

Exploring Agents: Get started by Creating Your Own Data Analysis Agent

LLMs have taken the world by storm, but on their own, they CAN'T do any particular task very well and produce unreliable results many a time. But if there's one thing that they do well, it is following cleverly, and meticulously crafted prompts. Let's use this ability of theirs to build some agents!