Overview
Obesity is a global health crisis influenced by multiple factors, including diet, physical activity, genetics, and lifestyle choices. This project utilizes a comprehensive dataset to uncover insights about obesity trends, identify key influencing factors, and propose actionable solutions.
Dataset Details
This dataset contains rich survey data covering lifestyle, dietary, and health-related variables. Below is a breakdown of the key attributes and their possible values:
Sex
Age
- Integer values (in years)
Height
- Integer values (in centimeters)
Overweight/Obese Families
Consumption of Fast Food
Frequency of Vegetable Consumption
- Rarely (400)
- Sometimes (708)
- Always (502)
Number of Main Meals Per Day
- 1–2 meals (444)
- 3 meals (928)
- More than 3 meals (238)
Food Intake Between Meals
- Rarely (346)
- Sometimes (564)
- Usually (417)
- Always (283)
Smoking
Daily Liquid Intake
- Less than 1 liter (456)
- 1–2 liters (523)
- More than 2 liters (631)
Calorie Monitoring
Physical Activity Frequency
- None (206)
- 1–2 days/week (290)
- 3–4 days/week (370)
- 5–6 days/week (358)
- 6+ days/week (386)
Daily Technology Use
- 0–2 hours (382)
- 3–5 hours (826)
- More than 5 hours (402)
Mode of Transportation
- Automobile (660)
- Motorbike (94)
- Bicycle (116)
- Public transportation (602)
- Walking (138)
Target Class (Obesity Category)
- Underweight (73)
- Normal (658)
- Overweight (592)
- Obesity (287)
Step 1: Understanding the Dataset
- Description: The dataset includes survey data covering various obesity classes.
- Key Features: Age, dietary habits, physical activity, smoking habits, and fast food consumption.
- Purpose: Build a foundation for:
- Visualizing obesity trends.
- Creating machine learning-based predictions.
Step 2: Data Exploration
- Objective: Familiarize yourself with the dataset structure.
- Import libraries and load the dataset.
- Examine the data types, missing values, and distributions.
- Learning Opportunity: Practice exploratory data analysis (EDA) to discover patterns and relationships.
Step 3: Data Visualization
- Focus Areas:
- Analyze obesity class distributions.
- Explore correlations between lifestyle factors and obesity:
- Age and obesity class.
- Impact of fast food consumption.
- Relationship with physical activity levels.
- Visualization Tools: Matplotlib, Seaborn.
- Pro Tip: Use a correlation heatmap to highlight key relationships.
Step 4: Machine Learning Implementation
Data Preparation:
- Split data into training and testing sets.
- Standardize features to ensure accurate model performance.
Modeling:
- Train and evaluate multiple algorithms:
- Linear Regression
- Random Forest
- K-Nearest Neighbors
- Decision Tree
- Support Vector Regression
- Compare performance using metrics like RMSE and ( R^2 ).
Prediction Insights:
- Understand which factors have the most predictive power.
Step 5: Answering Key Questions
Leverage machine learning and EDA to address these critical health-related questions:
- Does eating more vegetables reduce obesity?
- Are individuals eating more than three meals a day more likely to be obese?
- How do obesity rates differ between smokers and non-smokers?
- How does obesity vary across age groups?
- Does low physical activity and high fast food consumption lead to higher obesity rates?
Step 6: Insights and Recommendations
- Findings:
- Relationships between lifestyle choices and obesity.
- The role of dietary habits and physical activity in maintaining a healthy weight.
- Age-based obesity trends and risk factors.
- Recommendations:
- Encourage healthy eating habits.
- Promote regular physical activity.
- Target interventions based on age-specific trends.