Marc Linus Rosales

Aspring Data Professional

I enjoy building machine learning models and extracting meaningful insights from data to solve real-world problems.

About

πŸŽ“ Education: Computer Science, Batangas State University (2022-2026) | Consistent Dean’s Lister πŸ†

πŸš€ Goal: Passionate about transforming raw data into insights as a Data Analyst, Engineer, or Scientist.

πŸ“Š Interests: Data visualization, SQL optimization, and predictive modeling.

πŸ€ Fun Fact: I've analyzed everything from bicycle rentals to NBA stats!

Skills

Programming & Frameworks: Python (Pandas, NumPy, Scikit-Learn), C/C++, Java

Databases & Data Analysis: PostgreSQL, MySQL, MariaDB, Google Firebase, SQL; Microsoft Power BI, Microsoft Excel, Matplotlib, Seaborn

Workflow & Version Control: Git, GitHub, Docker, Dagster, Airflow

Personal Projects

February 2025

Senator Election Prediction Using YouTube Comments

Built a data pipeline to extract, clean, and store YouTube comments in PostgreSQL, following Kimball’s Data Warehousing principles. Applied BERT-based NLP for sentiment analysis and Named Entity Recognition. Used PCA & K-Means to cluster sentiment patterns and predict 12 likely senators. Designed a scalable architecture for real-time political trend analysis.

Pandas RandomForest Matplotlib Seaborn Power BI
View on GitHub
August 2024

Bicycle Rental Business Analysis

Achieved 84.56% accuracy in predicting optimal bicycle rental locations based on temperature ranges. Built a Power BI dashboard and a scalable ETL pipeline with Docker, APIs, Pandas, PostgreSQL, and Dagster, optimizing data collection from 1,000 to 47,000 rows daily. Automated workflows to enhance efficiency and support business expansion.

Pandas RandomForest Matplotlib Seaborn Power BI
View on GitHub
January 2024

Calories Burned Analysis

Revealed high-impact calorie burn from activities like racing and fast-paced running (1.77-1.99 cal/kg). Converted CSV to SQLite database and analyzed data to determine the most efficient calorie-burning exercises. Visualized key results to highlight top calorie-burning activities.

Python SQLAlchemy Pandas Matplotlib Seaborn
View on GitHub
December 2023

Household Income of Filipino Families Analysis

Analyzed spending habits and economic differences across regions in the Philippines. Highlighted disparities in food spending, with ARMM dedicating 48.19% of income to food versus NCR's 30.29%. Suggested industries to improve the economic landscape based on regional education and income data.

Pandas Matplotlib Seaborn Power BI
View on GitHub
October 2023

NBA Analysis

Identified critical factors, such as +3.3 rebounds and +3.08 assists, contributing to team victories. Processed 25,000 rows of game data for analysis after handling null values and data extraction. Visualized key insights impacting team performance for home and away games.

Pandas Matplotlib Seaborn
View on GitHub