Beyond Syntax: How I Used 'Vibe Coding' to Build an Enterprise-Grade Data Cleaning Pipeline in Google Sheets

Beyond Syntax: How I Used 'Vibe Coding' to Build an Enterprise-Grade Data Cleaning Pipeline in Google Sheets

Beyond Syntax: How I Used 'Vibe Coding' to Build an Enterprise-Grade Data Cleaning Pipeline in Google Sheets

Meta Description: Discover how Vibe Coding transforms data analysis. See how I automated a complex CRM data cleaning pipeline in Google Sheets using AI-assisted Google Apps Script, reducing hours of manual work to seconds.


The Business Challenge: CRM Data Normalization Nightmare

Every data analyst knows the pain. You have three different data sources, and they all speak a different language.

For this project, the objective was clear but the path was messy. We needed to consolidate monthly sales reports from three major automotive CRMs into a single, unified format for Meta Conversion API (CAPI) upload.

The Constraints:

  • Source 1 (VinSolutions): Inconsistent address columns and date formats.
  • Source 2 (DriveCentric): Missing "Gross Profit" columns and messy headers.
  • Source 3 (eLead): The header row wasn't even on the first line (it was on row 5), and names were combined in a single cell (Last, First).
  • The Goal: A clean CSV with 15 specific columns (e.g., event_time, event_name, value, order_id) formatted strictly (dates as YYYY/MM/DD, Zips trimmed to 5 digits, Phones normalized to +1XXXXXXXXXX).

Manual cleaning would take hours per file. Using standard Google Apps Script Automation, I built a tool that does it in seconds.


What is Vibe Coding in the Context of Data Analytics?

Before diving into the code, we must define the methodology. Vibe Coding is continuous iteration with an LLM to produce code that feels right—it handles edge cases, includes error logging, and creates a user-friendly UI.

Unlike the generic snippets found in articles like 5 JavaScript Data Cleaning Patterns, Vibe Coding allows us to handle context. It allows us to say, "The Zip code column keeps reverting to currency format because of ghost formatting; fix it."

The result is not just a script; it is a software application living inside a spreadsheet.


The Solution: A Universal Bulk Automation Tool

The final product was a menu-driven interface inside Google Sheets entitled "⭐ Drive Manager." It allows the user to:

  1. Scan Google Drive folders for the latest files.
  2. Detect which CRM the file belongs to (Vin, DriveCentric, or eLead).
  3. Apply the specific cleaning logic required for that CRM.
  4. Calculate average gross profit to fill in missing data gaps.
  5. Export the clean data to a tracking sheet.

1. The Traffic Controller (CRM Detection)

The core of the script is a function I call the "Traffic Controller." It reads a "CRM" column in our master list and routes the file to the correct cleaning engine.

// Pseudo-code representation of the routing logic
if (crm.includes("vin")) {
   cleanVinSheet(targetSheet, avgGross);
} 
else if (crm.includes("drive")) {
   cleanDriveCentricSheet(targetSheet, avgGross);
} 
else if (crm.includes("elead")) {
   cleanEleadSheet(targetSheet, avgGross);
} 

This ensures that we aren't applying VinSolutions logic (which expects headers on row 1) to eLead files (where headers are on row 5).

2. The "Nuclear Option" for Data Hygiene

One of the most persistent issues in Excel to Google Sheets workflows is "ghost formatting." We found that when converting Total Gross (Currency) columns into Zip Code columns during column reordering, the spreadsheet would force the Zip codes into currency format (e.g., turning 65010 into $65,010.00).

Through Vibe Coding, we developed reorderColumnsNuclear.

Instead of simply moving columns, this function:

  • Reads the data into memory.
  • Wipes the entire sheet clean (Content and Formatting).
  • Sets the entire sheet format to @ (Plain Text) to create a "defense shield" against auto-formatting.
  • Writes the data back.
  • Selectively applies Date formatting (YYYY/MM/DD) only to the specific date column.

3. Handling Edge Cases: The Logic Injection

  • The "Avg Gross" Injection: We encountered files where the Value column was empty or contained negative numbers (which breaks Meta ads optimization). The script automatically fetches a calculated "Average Gross" from a separate "Tracker" tab and injects it into any cell with missing or invalid data.
  • Phone Number Normalization: A Regex cleaner iterates through Cell, Home, and Work phones, stripping non-integers and formatting them to +1... for perfect matching rates.
  • Zip Trimming: eLead files often included 9-digit zips (e.g., 12345-6789). The script detects this and trims them to the standard 5 digits required for matching.

The Results

  • Efficiency: Reduced monthly reporting time from ~4 hours to < 5 minutes.
  • Accuracy: Eliminated human error in column mapping and date formatting.
  • Scalability: The script handles bulk operations, processing 50+ files in a single run without user intervention.

Conclusion

Data cleaning doesn't have to be a chore. With the right approach to AI-Assisted Google Sheets Automation, we can build tools that are robust, scalable, and incredibly fast. "Vibe Coding" isn't just a buzzword; it's a methodology that allows data analysts to deliver value exponentially faster.

Let's Automate Your Workflow

Are you looking for a Data Analyst who combines technical depth with modern AI workflows to solve business problems?

Connect with me on LinkedIn

Or email me directly at: henrymarinho90@gmail.com

GitHub Repo Live Demo