WooCommerce CSV Cleaner in Python (With AI)

Today felt like a shift from just learning concepts to actually building something practical.

Instead of focusing on small exercises, I worked on a real problem I’ve encountered many times in my experience as a WordPress and Laravel developer—messy WooCommerce product CSV files.

If you’ve ever imported products into WooCommerce, you probably know the pain:

missing SKUs
duplicate entries
inconsistent formatting
invalid stock statuses
missing prices

These issues don’t just cause minor errors—they can break imports, mess up inventory, and create real problems in production.

So instead of fixing data manually inside WooCommerce, I wanted to solve the problem before the data even gets imported.

The Idea

I decided to build a simple Python tool that follows this flow:

CSV → Clean → Validate → Separate → Export → Report

The goal wasn’t to memorize Python syntax.
The goal was to solve a real problem using Python and AI as a coding partner.

How I Built It (Step-by-Step)

Instead of building everything at once, I approached this in small steps.

Step 1 — Reading the CSV File

I started by loading the CSV file using Pandas:

import pandas as pd

df = pd.read_csv("data/sample_products.csv")
print(df.head())

This helped me understand the structure of the data before doing anything else.

👉 At this point, I could already see the columns:

name
sku
regular_price
stock_status
description
categories

Step 2 — Cleaning the Data

Next, I cleaned the data to make it consistent:

df["name"] = df["name"].astype(str).str.strip()
df["sku"] = df["sku"].astype(str).str.strip().str.upper()
df["stock_status"] = df["stock_status"].astype(str).str.strip().str.lower()

This step fixes things like:

” tb-001 ” → “TB-001”
“In Stock” → “instock”

At this stage, I wasn’t rejecting anything yet—I was just fixing what could be fixed.

Step 3 — Validating the Data

This is where things got interesting.

I defined what “bad data” means:

valid_stock_status = ["instock", "outofstock", "onbackorder"]

df["missing_name"] = df["name"] == ""
df["missing_sku"] = df["sku"] == ""
df["missing_price"] = df["regular_price"].isna()
df["invalid_stock"] = ~df["stock_status"].isin(valid_stock_status)
df["duplicate_sku"] = df.duplicated(subset=["sku"], keep=False)

Instead of asking:

“Is this data correct?”

I started asking:

“Can my system trust this data?”

If not, it gets rejected.

Step 4 — Separating Clean and Invalid Data

I split the dataset into two:

valid_df = df[~invalid_mask]
invalid_df = df[invalid_mask]

clean_products.csv → valid rows
rejected_products.csv → invalid rows

This makes it easy to continue working only with clean data.

Step 5 — Adding Rejection Reasons

Instead of guessing what went wrong, I added a reason:

Missing SKU; Invalid stock status
Duplicate SKU
Missing price

This turns the script into something usable, not just technical.

Step 6 — Exporting Results

Finally, I exported the results:

valid_df.to_csv("output/clean_products.csv", index=False)
invalid_df.to_csv("output/rejected_products.csv", index=False)

Now I have:

a clean dataset ready for import
a rejected dataset for fixing

Step 7 — Generating a Summary Report

I also generated a simple report:

Total rows: 100
Valid rows: 72
Invalid rows: 28
Missing price: 8
Duplicate SKU: 10

This gives a quick overview of the data quality.

What I Learned

This project changed how I approach learning Python.

Instead of memorizing syntax, I focused on:

defining a real problem
using AI to generate code
reviewing and refining the logic

As a developer, I realized:

The real value is not in writing code from memory—it’s in understanding the system and the data.

Real-World Applications

This type of tool is useful for:

WooCommerce product imports
ERP integrations
API data validation pipelines
preparing datasets for AI systems

It’s a simple project, but it solves a real problem I’ve encountered many times.

What’s Next

In Part 2, I’ll take this further by:

turning this script into a simple web app
allowing users to upload a CSV file
automatically returning cleaned results

This is just the beginning—but it’s a solid step toward building real tools using Python and AI.

CharlieLearnsAI.com

Building a WooCommerce CSV Cleaner Using Python and AI (Part 1)