Building a Personal Chatbot: Bringing Your AI Self to Life Build the AI version of yourself and allow your connections to chat with it and ask about your personality and interests.
CommonCrawl on Spark: Reliably Processing 28TB of Data (6300 WARCs, 7 CC Dumps) on a Spark Cluster Reliably crunch TBs of data with a solid error-tolerant system on Apache Spark!
Exploring Cloudflare Vectorize: A Free Vector Database for Developers Cloudflare has launched Vectorize, a highly accessible vector database that anyone can use, even on a Free Tier account! Vectorize enables developers to experiment with powerful vector search capabilities and build applications using advanced, unstructured data representations.
BigBanyanTree: Enriching WARC Data With IP Information from MaxMind You can gain a lot of insights by enriching CommonCrawl WARC files with geolocation data from MaxMind. Learn how to do that using Apache Spark!
Serializability in Spark: Using Non-Serializable Objects in Spark Transformations Discover strategies to effectively harness Spark's distributed computing power when working with third-party or custom library objects that aren't serializable.
Building ScriptScope Part 1: Extracting Top Used JS Libraries from Common Crawl using Apache Spark Learn how you can build website technology analysis tools like builtWith and Wappalyzer! In this blog, we identify the top-used JS libraries for 2024!
BigBanyanTree: Parsing HTML source code with Apache Spark & Selectolax Dive into the world of data extraction! Learn how to parse HTML source code from Common Crawl WARC files with Apache Spark and Selectolax for insightful analysis and unlock the potential of HTML source code.
Zero to Spark - BigBanyanTree Cluster Setup This blog details the setup of an Apache Spark cluster in standalone mode for data engineering using Docker compose on a dedicated Hetzner server. It also covers the setup of other utilities such as Jupyterlab and Llama-3.1 8B LLM service.
Building a Local Arxiv Paper Search Engine Discover how our local search engine for academic papers streamlines research, providing quick and relevant results through advanced data processing and semantic search!
llamafile : An Executable LLM LLM Deployment with llamafile: Discover how Mozilla’s llamafile simplifies LLM deployment into a single executable. Learn to optimize Docker images, explore alternatives like llama.cpp and ollama, and leverage quantized models for efficient resource use.
Guide to using OpenAI Assistant API Discover how OpenAI's Assistant API can transform your applications with intelligent virtual assistants. Explore tools and integrations that elevate user interactions effortlessly.
Leveraging AI for Financial Insights: From Data Fetching to Time Series Analysis Performing financial analysis by automating data fetching, generating complex SQL queries and performing detailed time series analysis, turning raw financial data into actionable insights.
PharmAssist AI: Detailed Implementation and Advanced AI Integration - Part 3 This installment dives into the technical implementation and sophisticated AI techniques that make PharmAssist AI a state-of-the-art tool for pharmaceutical research.
PharmAssist AI: Leveraging Extensive Datasets for Enhanced Pharmaceutical Research - Part 2 In this installment, we delve into the various datasets that form the backbone of PharmAssist AI, illustrating how these rich data sources are harnessed to provide comprehensive and accurate drug information.
PharmAssist AI: Revolutionizing Pharmaceutical Research with AI and RAG on FDA Open Medicine Dataset - Part 1 PharmAssist AI, a groundbreaking application designed to streamline the learning and research workflow for pharmaceutical professionals and students.
Dafinchi.ai - Quick Insights From Financial Documents dafinchi.ai helps you analyze complex company filings like 10-Ks in just a few clicks, making it perfect for staying informed about your investment options.
Mastering Object Detection with YOLO This blog defines object detection and then describes its applications and traditional methods. It further describes each YOLO iteration in detail. It also provides practical insights and commands for training YOLOv8 on custom datasets in Google Colab.
Generating Synthetic Text2SQL Instruction Dataset to Fine-tune Code LLMs Creating text2SQL data with defined roles, and sub-topics guiding natural language question generation using GPT-4.
Crafting Real-World Like Data For E-Commerce Domain Databases Creating authentic e-commerce data involving data generation, behavioral simulation and schema design for accurate analysis and the process refinement.
Instructiong fine-tuning code LLMs - An overview In the ever-evolving landscape of software development, the quest for enhancing the coding capabilities of large language models (LLMs) has led to innovative methodologies for fine-tuning these models.
Leveraging LLMs in Recommendation Systems Discover personalized content effortlessly with LLM-powered recommendation systems!
Movie Recommender System Using PySpark Explore the world of cinema with our cutting-edge PySpark movie recommender system that provides tailored movie suggestions to match your unique tastes and preferences.
Exploring News Article Similarity with PySpark: A Step-by-Step Guide Dive into news article similarity analysis with PySpark – unlocking insights at scale!
Unlocking the Power of PySpark SQL: An end-to-end tutorial on App Store data Discover the transformative capabilities of PySpark SQL querying in our comprehensive tutorial. Unleash the power of big data analytics with ease and efficiency!
Multi-Doc RAG: Leverage LangChain to Query and Compare 10K Reports Embark on a deep dive into RAG as we explore QnA over multiple documents, and the fusion of cutting-edge LLMs and LangChain. Learn how LangChain works along the way!
Exploring Leading Large Language Models: A Perspective on Today's AI Giants Discover the magic of large language models—the heart of AI giants. Take a look at how they have evolved. Join the journey into the future of artificial intelligence!
Unlocking Opportunities: Dive into the World of Data Annotation and Labeling Jobs Explore the world of data annotation opportunities! Unleash the power of AI by labeling and curating data. Join the journey where every annotation shapes the future of machine learning!
Empowering LLMs: Tools for Harnessing Human Expertise in AI Workflows Unlock the potential of LLMs with cutting edge tools by leveraging human input and feedback in the workflow. Explore the seamless and powerful Argilla tool, enhancing LLMs for trustworthy and accurate language processing.
Exploring Agents: Get started by Creating Your Own Data Analysis Agent LLMs have taken the world by storm, but on their own, they CAN'T do any particular task very well and produce unreliable results many a time. But if there's one thing that they do well, it is following cleverly, and meticulously crafted prompts. Let's use this ability of theirs to build some agents!
BrilliantSoupGPT: Revolutionizing HTML Parsing with Advanced AI BrilliantSoupGPT isn't just an ordinary tool; it's a groundbreaking blend of beauty and brilliance in the realm of HTML parsing and data extraction.
Understanding Byte Pair Encoding (BPE) in Large Language Models (LLMs) This fixed vocabulary constraint becomes particularly problematic for languages with complex word formation processes like agglutination and compounding, which can create a vast number of word variations that a fixed-size vocabulary can't cover.
Find LLM and Text2Image APIs to use for your next project For developers seeking to use LLMs in a programmatic manner, they need to find API providers for various models. While OpenAI provides APIs for ChatGPT and GPT4 LLMs, there are many other providers of APIs for open source and other proprietary models.
Mastering Retrieval Augmented Generation (RAG) Product Development: A Guide for Everyone Retrieval Augmented Generation (RAG) products stand at the forefront of this innovation, providing invaluable assistance in workflows where human expertise is critical.
Structuring a Machine Learning Project: A Guide Inspired by Trello's Project Planning Taking inspiration from a Trello board, we've broken down the stages of structuring a machine learning project.
Create Your Own Zoom Backgrounds with AI: Dive into New Realms Personalized Zoom backgrounds are just the tip of the iceberg when it comes to AI's capabilities. So, as you switch between a Victorian library and a tech lab for your next meetings, remember – the only limit is your imagination.
Unveiling the Cutting-Edge Nous-Hermes-Llama2-13b Language Model! Today, I'm excited to introduce you to a groundbreaking development in the world of natural language processing: the Nous-Hermes-Llama2-13b language model! For those who thrive on innovation, this might just be the next big thing in the NLP domain.
A Dive into Airoboros: Leveraging LLMs to Refine Language Models Rather than relying on conventional datasets, instruction datasets enable models to "teach themselves". The idea is rooted in automating the curation of high-quality data to help AI models refine their performance.
Embarking on the Future of AI Product Development Learn the technology concepts needed to build modern AI applications. From document processing to embeddings and vectors and all the way to integrating with LLMs, find code and explanations here.
Introducing the Fuyu-8B Model: The Future of Multimodal Intelligence We're thrilled to unveil the Fuyu-8B Model Card - a revolutionary step in the journey of multimodal artificial intelligence. This petite powerhouse, which is at the heart of our flagship product, is now accessible on HuggingFace for developers and AI enthusiasts to explore and leverage. Why is Fuyu-8B
How to Craft Effective Prompts for Stable Diffusion Models with ChatGPT In this guide, we'll walk you through using imaginative keywords and Large Language Models to craft these prompts for generating vivid images.
Setting Up a PySpark Notebook using Docker: A Step-by-Step Guide We will guide you through setting up a PySpark Jupyter Notebook using Docker. This tutorial is particularly useful for those of you keen on diving into the world of big data analytics using PySpark.
SQL Exercises with Google Play Store Data - all in your own Docker Container Have you ever wanted to learn SQL or enhance your SQL skills with real-world data? Today, we'll guide you through a journey where you can run SQL exercises on genuine Google Play Store data. The best part? All this will be encapsulated within a Docker container.
Introducing RustNLPService - Your Go-To NLP API Docker Image The RustNLPService provides a suite of NLP services for various tasks. These tasks are chosen to accomplish 90% of modern NLU workflows. All you have to do is deploy the Docker container in your environment and you are good to go.
Creating Dynamic Prompts with Jinja2 for LLM Queries When you're dealing with Large Language Models (LLMs) like ChatGPT from OpenAI, dynamically generating prompts based on different use-cases becomes essential.
Querying APIs with Python: A Brief Introduction for Aspiring AI Enthusiasts Whether you're new to programming or just new to Python, this blog post will guide you through the basics needed to query APIs, with a focus on querying language models like OpenAI's ChatGPT.
Harnessing the Power of ChatGPT for Data Science Queries Welcome back to datascience.fm, where we continuously bring the latest tools and techniques in the realm of data science to your fingertips. Today, we delve into the world of chatbots, specifically focusing on OpenAI's ChatGPT, a variant of the famous GPT-3 model tailored for conversational AI.
Unraveling the Magic of L1 & L2 Regularization In the world of machine learning, we frequently find ourselves balancing on the tightrope of model complexity. Regularization, particularly L1 and L2, emerges as our safety net, allowing models to learn while being constrained.
Monte Carlo Simulation Across Industries: Real-Life Applications and Case Studies The Monte Carlo Simulation (MCS) is not merely a fascinating blend of probability theory and computational prowess. Here, we delve into some intriguing real-life examples and case studies.
Bootstrap Statistics Across Industries: Real-Life Examples and Case Studies Bootstrap statistics, a brainchild of Bradley Efron in the late 1970s, has emerged as a powerful tool for statistical inference, especially when analytical solutions are difficult or infeasible to derive.
Unraveling the Gini Coefficient: A Deep Dive for the Technical Mind In the world of income inequality, metrics, and indices, there's one term that has certainly captured the limelight – the Gini Coefficient.
Confidence Intervals: A Deep Dive for the Technical Professional At its core, a Confidence Interval is a range of values we are fairly sure our true value lies in.
How to build an AI Product (8) - Harnessing the Power of Embeddings, Vector Search, and LLMs: A Glimpse into Modern Applications In the digital age, the continuous evolution of technology propels innovations that were once confined to research labs into everyday applications. Among these advances, the use of embeddings, vector search, and large language models (LLMs) like ChatGPT stands out, ushering in a new era of sophisticated tools that can comprehend,
How to build an AI Product (7) - Build an MVP UI for your backend service When paired with the robust FastAPI framework, a tool like jQuery can be instrumental in achieving this. FastAPI, known for its high performance and Pythonic ease, serves as an optimal backend choice for modern web applications.
How to build an AI Product (6) - Creating a Docker environment for easy experimentation Docker has revolutionized the way we build, ship, and run applications. It's an indispensable tool for developers and data scientists, allowing for consistent and reproducible environments. But why Docker? Lets find out.
How to build an AI Product (5) - FastAPI Backend Service: Return Similar Sentences FastAPI is a modern web framework for building APIs with Python 3.6+ based on standard Python type hints. It's known for its speed and easy-to-use syntax. We create a simple FastAPI backend service that accepts a sentence and returns similar sentences.
How to build an AI Product (4) - Efficiently Searching Vector Spaces with Annoy nter Annoy (Approximate Nearest Neighbors Oh Yeah) – a powerful library designed for fast approximate nearest neighbor search.
How to build an AI product - (3) Exploring the World of Embedding Models for Diverse Tasks Over the years, a plethora of embedding models have been designed, each fine-tuned for specific tasks. Let’s embark on a journey exploring these models, with a particular focus on the offerings from the Sentence Transformers library.
How to build an AI product - (2) Dive into Text Embeddings using Sentence Transformers and kNN Let's dive deeper into the realm of text embeddings. Text embeddings transform human-readable content into numerical vectors, making them palatable for machine learning models.
How to build an AI product - (1) Finding Interesting Datasets and Document Collection The quality and relevance of datasets are paramount in building effective AI models. In this post, we will dive into some unique and fascinating datasets that can set the foundation for intriguing AI applications.
Why Every Chemistry Graduate and Biotech Major Should Learn Python and the RDKit Library Introduction We've entered an age where the barriers between science and technology are becoming increasingly blurred. Artificial intelligence (AI) is finding its way into various sectors, including the world of chemistry and biotechnology. Python, a universally applauded programming language, has become the lingua franca of scientific computing. For
How to Use Prompting Techniques to Improve Outputs from Large Language Models Prompting is the technique of presenting a concise piece of text or information to LLMs to instruct them on the task at hand. By adequately feeding the LLMs information about the task, input data, and contextual specifics, we can guide their output towards the desired result.
Harnessing the Power of 280+ NLP, LLM, and Transformers Research Papers With expertise from roles at Netflix and LinkedIn, Harsh curated 280+ NLP papers from Arxiv into an interactive Kaggle notebook where you can chat with these research papers.
"Talk to Your Documents" Series: A Deep Dive with Harsh Singhal The digital space is brimming with information, and every day, we find new methods to interact with data. One of the emerging ways is through conversational AI, where we can communicate with documents, extract information, and generate meaningful outcomes.
Elevating Patent Analysis with AI: A Peek into the Future Unlock the power of AI in patent analysis with our latest tools and insights. From instant summaries to advanced AI-driven Q&A features, discover how Data Science is reshaping the future of intellectual property research and analysis.
Unlocking Expertise with Language Models: An Exploration using Expert Prompting Techniques with Examples for ML, AI and CS Questions A recently prompting approach proposes to ask LLMs to respond as an expert. It involves 3 different steps: * Ask LLM to identify experts in a given field related to the prompt/question * Ask LLM to respond to the question as if it was each of the experts * Make final decision
Decoding the Basics: 10 AI-Generated Prompts for Teaching Machine Learning to Beginners Here are 10 prompts that professors can use to instruct undergraduate students in Machine Learning over the course of 3 months: 1. Prompt 1: "I want you to act as a machine learning instructor for beginners. Start by explaining what Machine Learning is, how it works, and why it&
Decoding AI for Academia: A 10-Step Guide to Effectively Implement ChatGPT and GPT-4 in Teaching To use large language models (LLMs) like ChatGPT or GPT-4 effectively, professors can follow these steps: 1. Understand the Technology: The first step is understanding how these AI models work. Familiarize yourself with the capabilities and limitations of LLMs. Explore OpenAI's documentation and resources to get started. 2.
Teaching Empowered: Optimizing Prompts for Language Models in Engineering Education Optimizing prompts can greatly improve the outcomes when using language models like ChatGPT or GPT-4. Here are some ways an engineering college professor can use this strategy: Teaching Process:To enhance their teaching process, professors can create prompts that ask ChatGPT to explain complex concepts in simple terms or provide
Revolutionizing the Classroom: Seven Ways Professors Can Utilize ChatGPT in Higher Education Using AI tools such as ChatGPT can indeed help faculty members streamline their tasks and enhance the educational process. Here are a few ways they can utilize ChatGPT in their day-to-day activities: 1. Teaching Assistant: ChatGPT can be utilized as a virtual teaching assistant. It can handle common student questions
Unlock the Power of AI for Your Job Search: Introducing Our New ChatGPT-Powered Tool We are excited to introduce our new tool, powered by OpenAI’s ChatGPT, designed to aid job seekers in writing compelling cover letters.
Reskilling for the AI Revolution: A Guide to Prompt Engineering Embrace the future of AI with Prompt Engineering, a powerful reskilling opportunity for professionals returning to the workplace.
Unpacking Text Prompts for Image Generation: A Deep Dive into Furniture and Interior Design Prompts Explore the breakdown of text prompts for AI-powered image generation in the domain of interior design and furniture. We dissect a list of prompts, identifying key style elements and tags for an intuitive user interface that simplifies the generation of customized, hyper-realistic designs.
Unleashing the Power of Social Sentiment: 30 Innovative Machine Learning Projects for Financial Data Science 30 innovative project ideas leveraging Twitter sentiment and Python to offer fresh perspectives for traders, investors, and financial analysts.
Taking The First Step Towards Becoming a Data Scientist With Python: Discover Our YouTube Playlist! Hey!! aspiring data science wizards! If you've ever found yourself daydreaming about breaking into the incredibly dynamic and expanding world of data science, you're at the right place. Data science is more than a buzzword today. It is the backbone of countless industries, fueling decisions with
Unleashing the Power of AI in Chemical Research with molecule-search.com Have you ever wondered how to navigate the vast universe of chemical compounds? If you're a researcher, student, data scientist, or professional in drug discovery, chemoinformatics, or patent law, you know that finding the right molecule can be like finding a needle in a haystack. But what if
Machine Learning 101 for Everyone If you're interested in machine learning but don't know where to start, you're not alone. Machine learning is a complex and rapidly evolving field that requires a solid foundation in mathematics, statistics, and programming. Unfortunately, there is a plethora of resources available online and
Every Data Professional Should Know About the Common Crawl Project The Common Crawl dataset is a large collection of web pages and their associated text and images, which is made available to researchers and developers by a non-profit organization of the same name.
Secrets of building DALL-E and other text to image AI models Text-to-image models are one of the breakthrough technologies in AI. Learn all about how these models are developed and how massive amounts of data is collected for them.
Setup Your own Postgres Database and Analytics Playground Learn how to analyze play store apps data in Postgres by running your own Analytics playground with Docker
Playstore Apps Data Analysis with SQL (free SQL ebook included) Set up your own Postgres database and Analytics playground. Learn end-to-end SQL Analytics with our 30+ questions on the Playstore apps dataset.
Join Our LinkedIn Group Datascience.fm is proud to announce our official LinkedIn Group Our LinkedIn group will be exclusive and focus on bringing incredible value to our members. Some of the highlights you can expect from this group will be; * Exclusive job posts for our members. These job opportunities will be from global
Data Science and Machine Learning Handbook We proudly present the Data Science and Machine Learning Handbook to our readers worldwide. The Data Science and Machine Learning handbook takes you through 30 of the most important algorithms in Data Science and Machine Learning. Each chapter is dedicated to; * An important Data Science and Machine Learning Algorithm * Illustrations
Android Apps Data Analysis with Google sheets As a Business graduate in a world where Artificial Intelligence and Data Science are taking over industries and optimizing various business processes, it is critical for you to embrace AI.
TextDistill - a simple tool for finding topics and word clouds TextDistill - an AI powered tool for Topics discovery and sentiment analysis.
Topic Modeling - A Quick Overview The amount of data being collected and stored is growing by leaps and bounds every year. Text data is generated across all industries and is growing as fast as any other data type. Businesses across the world are turning to Artificial Intelligence to make sense of large volumes of raw
Announcing Our YouTube Channel Check our YouTube channel where we post news about Data Science and Artificial Intelligence so you can make informed career decisions and stay updated on important industry news
Awesome AI Bootcamps For College Students College students interested in developing a career in Artificial Intelligence, Data Science or Analytics can find themselves without quality guidance. This is where Industry leaders can come in to help students find a path that is without many distractions. One such industry leader helping students is Harsh Singhal. Harsh Singhal
Pandas Walkthrough E-book on 3 Most Important Concepts Pandas Walkthrough - our first ebook that teaches you 3 key concepts in Pandas that you need to master as a Data Scientist
Top Python Interview Questions Introduction A quick beginning to a series of posts which will guide you through the top python interview questions asked in any data science interview. Check out some great career options that require Python as a skill: Jobs Archive - Deep Learning CareersThe #1 Job Board for Deep Learning &
Data Science Notes for Freshers This post guides you through the most important ML algorithms according to the syllabuses of various Data Science degrees offered by Indian Institutions - Quick notes for all Data Science freshers.
Top Data Science Interview Questions for Freshers This post covers : • Resources to clarify Data Science career-oriented questions. •Top Data Science fresher interview questions and their answers. • Various learning resources linked to the respective questions.
Mega Guide to Pandas for Data Scientists In summary, we have taken a high-level rundown about python pandas, what it has to offer, various resources to learn it, and more.
Industrial Tools Every Data Scientist Should Know The top Data Science Industrial tools that a Data enthusiast should be aware of. These tools are divided into three categories; Coders, Clickers, and AutoML. Read to find out more..
The only Data Science Roadmap You will ever need This post will guide you through a detailed roadmap to becoming a Data Scientist that charts out which skills you want to accumulate and the resources which you can rely on for the same.
Top Data Science Internships This Summer And How To Crack Them This post will introduce you to some amazing Data Science summer internships to apply to this summer and also how to build a portfolio to crack them.
How to be an awesome Android Marketplace Data Analyst using SQL -Part 3 The final chapter of our three-part series 'Data Analysis using SQL'. A quick-to-follow-along series to help you take over the Android Marketplace Analysis.
ML Models recognize Images that are nonsense. Is that a Problem? Overinterpretation - a cause of concern for neural networks. Models trained on CIFAR-10, for example, made confident predictions even when 95 percent of input images were missing, and the remainder is senseless to humans.
How to be an awesome Android Marketplace Data Analyst using SQL -Part 2 Part two of our 'Data Analysis using SQL' using the real-life Android Apps Dataset. A guide to answering some business questions using SQL.
Predict using Data Science- How long will an Employee stay at a Company? Matt Eblen recently answered a question most employers want to know — how long an employee will stay at a company
How to be an awesome Android Marketplace Data Analyst for Beginners - Part 1 This post will guide you through writing SQL queries using a real-life dataset of Android apps to answer some analytics-oriented questions.
Write SQL queries effortlessly with Falcon by Plotly This post guides you through downloading and setting up Falcon - a free, open-source SQL editor which enables you to write your SQL queries without the hassle and moreover gives you an option to visualize the results.
Learn about Neutral Turing Machine(NTM) This post introduces the recurrent neural network model of a Turing Machine - Neural Turing Machine. Read on to learn about NTMs.
Learn about Gradient Boosting This article gives you a tour through one of the most powerful ML algorithms of all time, 'Gradient Boosting'. Read on, to learn the workings, pros, and cons & more.
January 10, 2022 - SQL Queries tutorial, Data Analysis using Pandas & Plotly, and more Wrote 2021 instead of 2022 in the heading - we are still in that stage. Check out today's newsletter to catch up with some informative content around the data world.
Top Artificial Intelligence and Machine Learning Startups to look out for in 2022 According to statistics, the artificial intelligence market was valued at the shocking USD 62.35 billion in 2020 during the covenant in the pandemic.
Learn about Deep Belief Network (DBNs) Introduction Greetings present & future Data-Scientists! If you want to learn Deep Belief Network in a comprehensive yet simple way, you’re in right place. You’re going to do just fine with the knowledge of the deep belief network in this article. This is an era of artificial intelligence.
Make Money by Scraping Data, Top Amazon Innovations, Data Analysis using Pandas & more stories New year, New Newsletter format! Catch up with the latest chatter around the 'Data World'.
Learn about Support Vector Machines Support vector machines (SVMs) are powerful yet flexible supervised machine learning algorithms that are used both for classification and regression. But generally, they are used in classification problems.
Learn about K-Means Clustering K-means clustering is a method of unsupervised learning used when you have unlabeled data. A typical unsupervised algorithm assumes that the output has been labeled without considering the input vectors.
What were the Top Python Libraries of 2021? Let's end the year with a 'Rewind' featuring 'Top Python Libraries of 2021' and more stories.
12 Python ML/DS Libraries Used By Top Data Scientists In this post, we talk about the top and most useful python libraries used by top data scientists around the world for ML and DS
Classification Metrics Every Data Scientist Must Know Learn about some of the top classification metrics every data scientist should know.
If You Want To Learn NLP, Start here: An Introduction to Natural Language Processing (NLP) If you’re a newbie and want some solid NLP foundation? This is the best place to start. Learn the basics of NLP here and get started!
State of Data Science in 2021, Data Governance & more stories this week "11:07 am is the average time that people in the UK had already sipped on their first alcoholic drink during the Christmas holidays". So pour a glass (maybe), and catch on with the latest Data-World insights.
December 24,2021-MIT's AI development can predict breast cancer 5 years prior? Did you know, according to statistics, Santa’s sleigh would weigh around 354,430 tonnes on Christmas Eve due to the number of houses he has to deliver to? Let's Catch-up with the chatter going in the 'World of Data' before Santa gets here!
2021 AI Index Report, JP Morgan & Data Science, Custom GPT-3 by OpenAI, and more "Did you know the number of AI journal publications grew by 34.5% from 2019 to 2020?" Putting forward more such stories in this week's catch-up.
Managing 'Managment' with ML Research found that managers were able to account for far less of their teams' work than expected, but machine learning algorithms closed the gap. Read today's stories to find out more.
What's new in Django 4.0? and more stories Nothing provides you a better reflection of yourself than your Spotify Wrapped. Yes, it's that time of the year again. Let's dive into today's newsletter and discover how DoorDash retrained its models due to COVID-19 and much more.
Developing AI to understand object relationships Reddit is a home for amazing one-liner data science jokes, here check this out. "How did the random variable get into the club? By showing a fake i.i.d"
Tutorial to write SQL queries using Python and visualizing the results in Tableau - Part 4 It is a very interesting fact that the colour white has the highest resale value, ever wondered why? It is all the game of demand and supplies, the people generally tend to buy a white used car than a green, grey, or custom one. Similarly, we have been drawing inferences
Tutorial to write SQL queries using Python and visualizing the results in Tableau - Part 3 Another business problem for your SQL and tableau learning regime. Here we'd be dealing with the code and how to visualize it in Tableau. > Dataset: Vehicles.csv [../input/vehicless] What makes a car worth buying? It is the cost-effectiveness and a brand name that is linked to
Do you trust AI? and more stories Did you know that 2021 reported the highest data breach costs to date? Check out today's newsletter to know more.
Is the future of 'open source' sustainable? What did one support vector say to another support-vector? "I feel so marginalized". Today's rundown is on AI policies, data infrastructures, and much more. Stay engaged to discover.
Artificial Intelligence (AI) Policies in India- A Status Paper As of now, several governmental bodies and policies are studying and aiding AI. Several papers are being published to share the Indian government’s findings on AI. Today we will explore one such paper, “Artificial Intelligence (AI) Policies in India- A Status Paper”, an in-depth analysis of all AI policies
No Code AI: Here to stay? Before heading deep into the week, a quick summary of what is going on around the data world.
What's new in Python 3.10? and more updates this week Posted every Monday and Friday, this newsletter offers an efficient way to stay on top of Data Science news, events, and more.
Tutorial to write SQL queries using Python and visualizing the results in Tableau - Part 2 Introduction By seeing your interest in the previous SQL queries posts, we have curated this series to work any query in a more structured way by firstly understanding the objective behind carrying out this activity. Secondly, by working out the query in Kaggle to get the desired table. Thirdly, by
Tutorial to write SQL queries using Python and visualizing the results in Tableau - Part 1 This post walks you through the SQL queries in the Kaggle python notebook and shows you how to visualize them using Tableau. We are taking the following dataset for our reference: > Dataset: Used Cars Dataset [https://www.kaggle.com/austinreese/craigslist-carstrucks-data] How has the price of vehicles changed over
Is SqlTranform unavailable for Python? A quick rundown from the recent developments and informative content from the world of data.
Top 10 Artificial Intelligence Companies in 2020 What if I told you that this whole article was written by an AI, I just suggested the theme and the layout of the article and the AI backed it up with research, gave me plagiarised free content in a span of 5 minutes. I'm just joking, of
DBeaver vs Python: Two Methods To Run Your Queries This is a step-by-step guide on how to run your SQL queries using Dbeaver and Python.
Getting Started with SQL Queries - Exercises for Beginners Part-4 Here's a selection of the most useful SQL queries every beginner must practice.
Getting Started with SQL Queries - Exercises for Beginners Part-3 Here's a selection of the most useful SQL queries every beginner must practice. This article will not only provide you with the answers but also the explanation to each query.
Getting Started with SQL Queries - Exercises for Beginners Part-2 Here's a selection of the most useful SQL queries every beginner must practice. This article will not only provide you with the answers but also the explanation to each query.
Getting Started with SQL Queries - Exercises for Beginners Part-1 Here's a selection of the most useful SQL queries every beginner must practice. This article will not only provide you with the answers but also the explanation to each query.
A World of Opportunities: Top Kaggle Competitions with Cash Prize Learn about the standard competition categories in Kaggle which will help you apply all your Data Science knowledge while also giving you a chance to win cash prizes.
Top 10 Data Science Blogs from Major Tech-Firms We learn about important insights from trending data science posts published by great minds working at top firms.
Data Scientists Reveal: 40 Best Cheat-Sheets on the Web This article doesn't only link to the cheat-sheets but also some of the relevant sources which would help you in your python coding. This will help you get through some of the trickier tasks in data science
Can You Get Paid for Scraping Data? Learn with us what web scraping and Scrapy is and how you can use the same to your benefit.
Here is how MNCs are using Machine Learning to get ahead of their competition We will go through some topmost machine learning case studies from world-class businesses to get an idea of how Amazon Web Services has been transforming institutions across industries in their computing needs and much more.
History of Aritificial Intelligence In this article, you will read about the history of artificial intelligence and how it came into existence
Top Evaluation Metrics For Your NLP Model Learn about the top evaluation metrics for your next NLP model.
Pandas Group By - Key areas you should watch out for We surveyed Stack Overflow questions related to Pandas Group By and came across few areas that have been repeatedly discussed - * Computing multiple statistics for each group * Sorting within groups * Extracting the first row in each group Extracting multiple statistics for each group Common aggregation functions are mean, median, and
Explore the Twitter graph dataset and learn how to analyze complex graphs using simple exploratory data analysis (EDA) Cover ImageIntroduction Twitter logoMost of us think of Twitter as a social network, but it's actually something much more powerful! Twitter is a social news website. It can be viewed as a hybrid of email, instant messaging and SMS messaging all rolled into one neat and simple package.
Top IDEs in Python That Every Data Scientist Should Know How would you feel if you get the freedom to design a layout, write the code, test, debug, build, etc., all at one place? Isn’t that epic? That is what an IDE is ready for! Overview * For an effortless coding experience, it is crucial to choose an IDE for
Top Data Scientists You Should Follow On Twitter Data science is all around us these days from academia to major industries and tech giants, the demand has been growing exponentially. So, what is data science all about anyway? And who are the data scientists to follow online for expertise in this fast-growing field? Here is a list of
5 Most In-Demand Machine Learning Projects That Will Get You Hired We talk about the various ML projects that will build your confidence and get you job-ready!
What is an easy and sustainable approach to transition to Data Science? If you are in undergrad and are contemplating a career in Data Science and Analytics, pick up tools and technologies that enjoy industry wide usage such as Tableau, PowerBI, R, Python and others that are commonly used. Learn from online courses and free YouTube resources such as freeCodeCamp. If you
If you had the chance to go back in time, what would you do differently about your data science journey? Going back in time is a great way to understand what is it that one needs to do now to change course for the better. I love this trick that makes for great conversations and self-examination. One thing I will change if I can go back is to spend more
How to evaluate a Master's degree in Data Science? If you keep your notebook close, and your pencil even closer, and make notes and connect the dots, and you do this even for a few weeks (30-45 mins daily), at the end of a month you would have gained a whole lot of perspective and understanding.
How to know which domain of data science is suitable for you? All of a sudden, a company will go from collecting very little data to now collecting data from their website (views, clicks), employee data, manufacturing (from machines, processes, employee activity) and any other aspect of their business that needs to be improved.
Consistency is the key to building long-lasting value Shivannas is a small store in Bangalore on JC road. I used to go there as a kid almost three decades ago with my parents. This was my dad's hangout when he went to college near by. My grandfather used to stop there to pick up nippat, have
What is the "quick start" approach to learning Machine Learning? Take up one topic from your notes and go research it by googling it, finding Wikipedia articles, blogs, more examples of its usage, read related discussions on StackOverflow and gain as much understanding of the concept as you can.
I am trained in statistics but worry what I can contribute? Should I focus on technical skills more? Please help. > The discussion shared below was part of many Q&A sessions Harsh Singhal conducted with Data teams at various companies and colleges. Analytics is statistics applied with code (code = SQL/Python/R/Java/etc.) E.g., when you identify the average or variance of an attribute in a
What can be done to enhance my career in the field of Data Science? Everyone learns differently. But to know what works for you, choose a system that you found successful in school or college. Often we used to sit with a resource (usually a book) and read and take notes. Do the same now. The process is the same but the resources have changed.
How to build a data-driven mindset? A mindset is a result of what you repeatedly do. A data-driven mindset means that you bring data to decision making, repeatedly. Bringing data to decisions implies a lot of things. * It could mean developing engaging dashboards. * Developing datasets to make it easy for other Analysts to run deep-dives. * It
How can a business analyst transition to ML? Especially when analysts get little or no exposure to ML in their job? Participate in Kaggle.com and learn AI/ML on https://www.fast.ai/ (free). Do excellent Analytics work by day for your company and spend your evenings pursuing your AI/ML dream. A resource like www.fast.ai [http://www.fast.ai] has made it very easy for folks to
Do I need to have basic knowledge in every tool out there for data science? If you can create a list of every tool existing in data science, that list itself will be fascinating to share with others. After you have done the research to create such a list, add a few lines to summarise what each tool does and one or two pros and
Motivations for staying in a role for longer than a few years? > The discussion shared below was part of many Q&A sessions Harsh Singhal conducted with Data teams at various companies and colleges. People stay and leave for a variety of reasons. As long as you are convinced to stay or go, then your reasons are the right reasons.
How to plan and execute the company's ML project for better business impact? An ML project, like any other project requires the project team to get together in the early stages to resolve lots of open questions to better scope the project. What is the problem being addressed and why is it seen as a problem that needs to be solved? Very often
How useful is a master's degree in data science? Is it really worth it? Or can I learn the same things on the job? First, identify what it is you want to learn. Can you come up with a list of topics and projects that you wish to work on? Then, you will have to research various degree programs and their curricula for overlap. Additionally, you should then look at different companies and the
How to choose a path (Data Engineering, BI, Data Science, Product)? Careers will span longer periods (decades and more) increasingly, so people have to be ready to make changes throughout their careers. Changes that are made in the pursuit of media-driven hype can be difficult to sustain as new hype cycles will quickly replace the previous ones. What is sustainable is
How can I have a successful career in e-commerce analytics? What does success mean to you? This is a broad question to an already broad question :-) Try to identify what interests you about e-commerce and aspects of e-commerce that Analytics can help improve? So many aspects are involved in making a sale online such as recommendations, product placements, advertisements,
What are the essential skills required for each role in the Analytics domain? And how to pursue it? If you have a job, use the industry you are in as a starting point to explore your aptitude and interest. You might like to write lots of SQL queries, extract insights, and make interesting presentations with spreadsheets and slides. This is as useful and important as someone writing data
How to clean up past projects ?Or call it "legacy" and just move on? The first step is a detailed assessment. Is there a document that takes up different aspects of the previous project to identify pros and cons? Was there a requirements document that was used as a guide to developing the project that is now being called “legacy”? If not, then write
Gain relevant experience by creating your own project Common Crawl (CC) is a very popular dataset and can satisfy a great many text analysis and NLP tasks. If you have an AWS account you can access the CC index files in Parquet format. Follow the instructions shared in this article [https://commoncrawl.org/2018/03/index-to-warc-files-and-urls-in-columnar-format/] to get
Fast HTML parsing in Python I love BeautifulSoup library in Python. It is one of those libraries that just work and make your life easy. I recently came across a fast parser for HTML in Python, https://github.com/rushter/selectolax and waiting to give it a spin. The few articles I read where selectolax
Exciting NLP and Computer Vision research for Patent Chemistry discovery and analysis In 2021 I and a bunch of very smart and driven ML/AI engineers are working on developing NLP and CV algorithms to discover and extract interesting chemistry from patents. We developed http://ichemist.ireadrx.ai [http://ichemist.ireadrx.ai/] where a user can search for chemistry compounds and find
Analyzing Bitcoin Transactions on Google Big Query I've written some queries on analyzing BTC transactions data https://sql-recipes.netlify.app/ This is a broader project of mine to analyze many other publicly available datasets with Google's Big Query. Go take a look.
Postgres text similarity with commoncrawl domains Commoncrawl [https://commoncrawl.org/] is a public repository of web crawl data made available for analysis. In this post I want to extract the list of domains crawled, stick them into a Postgres database and play with text similarity functions provided by pg_similarity extension. The basic steps to be
Multiple Plots using Ggplot2 I am a big fan of the tidyverse [https://www.tidyverse.org/] set of libraries, especially ggplot2 While there is a raging debate on the use of base-r vs tidyverse to teach R to beginners, I will choose tidyverse for the convenience and for the connected ecosystem of many libraries
Learning Google Colab using Kaggle Datasets: A Beginner’s Guide Learn to import Kaggle dataset into Google Colaboratory using seven easy to follow steps.
Just Learn SQL! SQL is everywhere. From databases that we are all too familiar with such as Oracle, MySQL, Postgres to Big Data offerings such as Hive, SparkSQL, Presto, Redshift, Athena and BigQuery to name a few.
Video Game Sales Analysis Analyzing the 'Video Game Sales Analysis' dataset with the help of pandas library and learn the fundamentals of plotly by visualizing the result using simple yet interactive graphs.
Customer Churn Analysis Analyze customer churn using a telecom company dataset. Identify different attributes and answer questions like "Are we losing customers?" and "If so, how?"
Starter Datasets for Data Science: A blog around the top 10 datasets for beginners Understand different datasets from various domains, the role of data science in each one of them and learn to ask the right questions to get the best results from the given data.
Analyzing Startup Investments Analyzing the startup investment dataset to gain insights about how different features are related to startups across the globe.
Data Analysis and Visualization in the retail/FMCG sector Analyzing the 'BigMart Sales' dataset with the help of pandas library and learn the fundamentals of plotly by visualizing the result using simple yet interactive graphs.
Analyzing the Spotify dataset to gain insights in the music industry Building a strong foundation through the pandas library by working on the 'Spotify' dataset. We will discuss some very basic tools that pandas provide to help gain insights into any dataset in the music domain.
Analyzing the Flipkart Sales Dataset to gain business insights Let's learn to do a deep analysis of the Flipkart sales dataset using the pandas and plotly libraries. We will take into account various features to gain key insights into our data.
Analyze Chipotle orders dataset with Pandas to develop insights for digital ordering app Pandas is a very powerful library enabling you to perform analytic operations with ease and speed. If you need to draw valid conclusions from your data, Pandas should be your first choice. In this series of posts, I'll show you how to use this feature-packed Python data analysis library.
What does Data Science mean for businesses Data Science (DS) has indeed captured the imagination of entire swathes of industries and domains. With the now famous statement that "software is eating the world" actually ringing true, businesses are undergoing digital transformation as a matter of routine. As part of these digital initiatives, businesses are also
Bringing clarity when hiring a Data Science professional Interviewing candidates for a Data Science position can be a challenge. I've developed hiring processes in the past and continue to work with some of my founder friends to tweak and refine their Data Science hiring initiatives.
Simple deep learning environment setup I came across Deepo [http://ufoym.com/deepo/] recently and was amazed at the level of simplicity that is now available if one wants to start tinkering with deep learning. Check out the deep learning libraries you can setup with a single command. Head over to their Quickstart [http://ufoym.
Presenting your work For those graduating with a degree in Analytics and Data Science, you have many resources to work with when it comes to presenting your skills to hiring managers. Some examples below. Search for filetype:pdf and case studies in your field. Identify the ones that stand out. Now go over
Thoughts on the use of AI models distributed online AI model marketplaces come in various shapes and sizes. Outside of the conventional marketplace approach, large Internet companies have made open-source contributions that include not just libraries but also models. BERT [https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html] is an example of a model distributed online by Google that
Crowdsourcing IP reputation data from online forums In this post, I discuss the topic of crowdsourcing IP reputation data from online forums. The post is inspired by a paper I read recently, Gharibshah J., Papalexakis E.E., Faloutsos M. (2018) RIPEx: Extracting Malicious IP Addresses from Security Forums Using Cross-Forum Learning. In: Phung D., Tseng V., Webb
Developing a data product I wanted to develop a product end to end. One product idea was to develop a service that would identify web technologies used by popular sites. The "end to end" development of such a product would require crawling URLs, parsing of raw website data, data processing, server-side web
Deploy a scalable image labeling service Develop an image labeling service using a pre-trained deep neural network and scale the service by deploying to the cloud.
Label images with a deep neural network It isn't often that you need to label lots of images. But when the need arises you can use many of the pre-trained deep neural network models available for image labeling tasks. One of the most popular ones is ResNet50 and keras provides a convenient [https://keras.io/
Learning Docker by building an R learning environment Docker [https://www.docker.com/] is a technology that makes it very easy to try a piece of software or technology without running into installation problems that you would otherwise run into if you were to install software directly or natively on your system. Docker gives you an environment that
Learn SQL in a browser with PostgreSQL and pgweb PostgreSQL is a very versatile database. If you want to learn SQL, then a quick way to start is to 1) grab some data you want to analyze 2) insert into a PostgreSQL table and 3) use a SQL client such as pgweb and get started analyzing data. You can