I planned a DeepSeek article on my Trello board a few months ago, but it has never been published. Now, the DeepSeek articles are all over the place.
My original title was DeepSeek, a free alternative to GPT o1. However, after reading these articles and testing them for a month, I have changed my mind. It is mind-blowing!
This article has explored its features, as we have always done! From the point of view of a Data Scientist, we will do a mini-data project!
Most popular 1000 YouTube videos
Now let’s use this dataset as we all have been watching YT videos daily, so maybe we would find a funny or interesting video.
Kaggle Dataset : https://www.kaggle.com/datasets/samithsachidanandan/most-popular-1000-youtube-videos
Great, let’s download this dataset from this link first and explore this dataset further!
Data Exploration
Let’s start very simple. Use the following prompt;
tell me what is this csv is all about:
Here is the output
Output of Deepseek
Good, I like it. Before diving into advanced prompts, I am going to use Prompt Perfector.
Prompt Perfector
If you are used to reading my articles, you probably know Prompt Perfector. It is a GPT that was developed by the LearnAIWithME team.
It will be available on their website in a couple of days, but you must become a paid subscriber here.
But if you don't want to, it is okey, use the following prompts.
Be a paid subscriber to reach Prompt Perfector from here
As you can see, we have great prompts for each subsection of Data Science. Since we know a little bit about the dataset, let’s continue to the Data analysis section.
Data Analysis
Here is the prompt that we have created;
Conduct an analysis of the most popular YouTube videos dataset.
Identify trends in video views, likes, and dislikes across different categories
and publication years.
Determine correlations between engagement metrics (likes, dislikes, views)
and rank position. Explore which video categories tend to perform best
and whether older videos continue to dominate rankings or if newer videos
are gaining traction.
Here is the first part of DeepSeek’s output.
Output of Deepseek
Good, let’s see the second part.
Output of Deepseek
It is excellent, especially for a free alternative! Let’s continue with the data visualization.
Data Visualization
Good, let’s create visualization. Here is the prompt that we are going to use!
Visualize key insights from the YouTube dataset using appropriate charts.
Generate bar charts for top-performing categories, time-series plots for
publication year trends, and scatter plots to analyze correlations between
views, likes, and rank. Use histograms to show distribution patterns and highlight any anomalies in engagement metrics.
Here is the output.
SS of the Deepseek
Unfortunately, it can not generate graphs, but it provides ideas.
Also, if you use a normal model, not deepthink, it generates code too.
SS of the Deepseek
Good, now let’s continue to the machine learning section.
Machine Learning
SS of the output
I have been trying to write this article for months now. When I get ambitious about finishing it and continue writing this one, I make errors, and I wait and wait until my ambition is gone.
Unfortunately, I can’t write the machine learning section because the server is busy. I tried again, but it was busy again!
Final Thoughts
Deepseek might be great, but I don’t think it would replace ChatGPT in the long run because of its performance issues.
On top of that, it can not produce visuals.
Thanks for reading this one! If you like what you saw, you can subscribe to us from Substack. If you like what you saw, you can visit our platform, LearnAIWithME, which includes Assistants, AI Projects, and Editors Pick. There, you can summarize AI news by just clicking!
See you there! Here are free resources;
Here are the free resources.
Here is the ChatGPT cheat sheet.
Here is the Prompt Techniques cheat sheet.
Here is my NumPy cheat sheet.
Here is the source code of the “How to be a Billionaire” data project.
Here is the source code of the “Classification Task with 6 Different Algorithms using Python” data project.
Here is the source code of the “Decision Tree in Energy Efficiency Analysis” data project.
Here is the source code of the “DataDrivenInvestor 2022 Articles Analysis” data project.
“Machine learning is the last invention that humanity will ever need to make.” Nick Bostrom