Today felt like a shift from just learning concepts to actually building something practical.
Instead of focusing on small exercises, I worked on a real problem I’ve encountered many times in my experience as a WordPress and Laravel developer—messy WooCommerce product CSV files.
If you’ve ever imported products into WooCommerce, you probably know the pain:
- missing SKUs
- duplicate entries
- inconsistent formatting
- invalid stock statuses
- missing prices
These issues don’t just cause minor errors—they can break imports, mess up inventory, and create real problems in production.
So instead of fixing data manually inside WooCommerce, I wanted to solve the problem before the data even gets imported.
The Idea
I decided to build a simple Python tool that follows this flow:
CSV → Clean → Validate → Separate → Export → Report
The goal wasn’t to memorize Python syntax.
The goal was to solve a real problem using Python and AI as a coding partner.
How I Built It (Step-by-Step)
Instead of building everything at once, I approached this in small steps.
Step 1 — Reading the CSV File
I started by loading the CSV file using Pandas:
import pandas as pd
df = pd.read_csv("data/sample_products.csv")
print(df.head())
This helped me understand the structure of the data before doing anything else.
👉 At this point, I could already see the columns:
- name
- sku
- regular_price
- stock_status
- description
- categories
Step 2 — Cleaning the Data
Next, I cleaned the data to make it consistent:
df["name"] = df["name"].astype(str).str.strip()
df["sku"] = df["sku"].astype(str).str.strip().str.upper()
df["stock_status"] = df["stock_status"].astype(str).str.strip().str.lower()
This step fixes things like:
” tb-001 ” → “TB-001”
“In Stock” → “instock”
At this stage, I wasn’t rejecting anything yet—I was just fixing what could be fixed.
Step 3 — Validating the Data
This is where things got interesting.
I defined what “bad data” means:
valid_stock_status = ["instock", "outofstock", "onbackorder"]
df["missing_name"] = df["name"] == ""
df["missing_sku"] = df["sku"] == ""
df["missing_price"] = df["regular_price"].isna()
df["invalid_stock"] = ~df["stock_status"].isin(valid_stock_status)
df["duplicate_sku"] = df.duplicated(subset=["sku"], keep=False)
Instead of asking:
“Is this data correct?”
I started asking:
“Can my system trust this data?”
If not, it gets rejected.
Step 4 — Separating Clean and Invalid Data
I split the dataset into two:
valid_df = df[~invalid_mask]
invalid_df = df[invalid_mask]
clean_products.csv→ valid rowsrejected_products.csv→ invalid rows
This makes it easy to continue working only with clean data.
Step 5 — Adding Rejection Reasons
Instead of guessing what went wrong, I added a reason:
Missing SKU; Invalid stock status
Duplicate SKU
Missing price
This turns the script into something usable, not just technical.
Step 6 — Exporting Results
Finally, I exported the results:
valid_df.to_csv("output/clean_products.csv", index=False)
invalid_df.to_csv("output/rejected_products.csv", index=False)
Now I have:
- a clean dataset ready for import
- a rejected dataset for fixing
Step 7 — Generating a Summary Report
I also generated a simple report:
Total rows: 100
Valid rows: 72
Invalid rows: 28
Missing price: 8
Duplicate SKU: 10
This gives a quick overview of the data quality.
What I Learned
This project changed how I approach learning Python.
Instead of memorizing syntax, I focused on:
- defining a real problem
- using AI to generate code
- reviewing and refining the logic
As a developer, I realized:
The real value is not in writing code from memory—it’s in understanding the system and the data.
Real-World Applications
This type of tool is useful for:
- WooCommerce product imports
- ERP integrations
- API data validation pipelines
- preparing datasets for AI systems
It’s a simple project, but it solves a real problem I’ve encountered many times.
What’s Next
In Part 2, I’ll take this further by:
- turning this script into a simple web app
- allowing users to upload a CSV file
- automatically returning cleaned results
This is just the beginning—but it’s a solid step toward building real tools using Python and AI.

Leave a Reply