Automotive Manufacturing Defect Detection AI

The situation before we got involved

Automotive manufacturing doesn't tolerate defects. A single faulty component reaching an assembly line downstream can halt production, trigger a warranty claim, or - in the worst case - end up in a recall that costs far more than the part was ever worth. Quality control at the stamping, casting, and machined-parts level is therefore not optional; the question is only how well it works.

For this Tier 1 supplier, that meant a team of visual inspectors stationed at the end of each production line, eyeballing roughly 60,000 units a day across eight lines. The process was manual, shift-dependent, and fatigued by hour four of a nine-hour run. The supplier's defect escape rate - the percentage of genuinely defective units that left the facility undetected - sat at around 2%. At production volume, that translated to 1,200 defective parts shipped every day.

Their OEM customers had started flagging it. Two recall events in eighteen months had damaged the relationship with their largest account. The plant manager knew the manual inspection process was the root cause and had already explored off-the-shelf machine vision systems, but nothing he'd seen could handle the variation in part geometry across their product range.

"We weren't looking for a system that caught 90% of defects. Our OEM customers were already raising flags. We needed something that could genuinely replace a human inspector - across every part type, every shift, every line."

Why this was harder than it looked

Automated visual inspection sounds like a solved problem. Off-the-shelf systems exist. The challenge here was in the specifics - three things that ruled out every vendor solution we evaluated before committing to a custom build.

Part geometry variation across the product range

The supplier manufactured over 340 distinct part numbers across the eight lines. Each part has a different surface profile, reflectance characteristic, and set of acceptable defect tolerances. A surface scratch acceptable on a structural bracket is a rejection on a visible-face trim component. Any inspection system had to handle that variation in real time - not with a different setup for every SKU, but dynamically, based on what was running on the line at any given moment.

Line speed and latency constraints

Parts move through the inspection station at line speed. The system had a window of under 200 milliseconds per part to capture, process, and issue a rejection signal before the part moved past the pneumatic ejector mechanism. That is not enough time to send image data off-site, run inference in the cloud, and receive a result. Everything had to run at the edge.

Lighting and surface conditions

Stamped and cast metal parts have surfaces that are challenging for vision systems. Oil films from forming lubricants, heat-induced discolouration, and the transition between machined and as-cast faces all create false positive patterns that confuse models trained on clean training data. Getting the false positive rate low enough to avoid operator alert fatigue - while keeping the true defect detection rate high - required careful work on both the imaging setup and the training dataset.

How the system works

The production architecture runs across five stages, each refined significantly from the original specification as we encountered real factory conditions.

System Architecture - Production Simplified

Multi-Angle Image Capture

Each inspection station has 12 industrial cameras arranged in a fixed multi-angle rig - 4 top-down, 4 oblique at 45°, and 4 side-facing. All 12 cameras fire simultaneously on a trigger pulse tied to the conveyor encoder. This gives the model a complete surface scan of every part regardless of orientation on the belt.

12-camera rig Encoder trigger Strobe lighting

On-Device Pre-processing & Part Classification

Raw frames land on a Jetson Orin node co-located with each inspection station. An OpenCV pre-processing pipeline normalises exposure, strips the conveyor background, and crops to the part bounding box. A lightweight classifier identifies the part number from a QR code scanned upstream - this tells the inference engine which defect thresholds to apply.

OpenCV pipeline QR-based SKU ID TensorRT pre-proc

YOLOv8 Defect Inference

A custom-trained YOLOv8 model runs on-device via TensorRT. The model was trained on 180,000 labelled images across 14 defect categories - surface cracks, porosity voids, dimensional deformation, edge burrs, coating gaps, and eight further classes specific to this product range. The model outputs a per-defect confidence score and localisation bounding box for each of the 12 camera views.

YOLOv8 custom TensorRT INT8 14 defect classes

Multi-View Confidence Aggregation

A thin aggregation layer combines the confidence scores from all 12 views using a weighted voting scheme. Views that have higher historical accuracy for a given defect type carry more weight. If the aggregated confidence for any defect class exceeds the SKU-specific threshold, a rejection signal is issued. The full result set - confidence scores, defect class, bounding boxes, all 12 images - is logged to a local PostgreSQL instance and mirrored to S3.

Weighted voting Per-SKU thresholds Full audit trail

Real-Time Rejection & Operator Dashboard

A 24V signal to the pneumatic ejector fires within 2ms of a rejection decision - comfortably within the conveyor timing window. Rejected parts land in a labelled bin with a printed rejection slip showing the defect class and camera view. Line supervisors have a live Grafana dashboard showing current defect rate, rejection volume by defect type, and a rolling alert if the escape rate estimate trends above threshold.

<2ms rejection MQTT signal bus Grafana live dash

What went wrong, and what we learned

We had two significant setbacks during the build. Documenting them honestly is more useful than a sanitised version of events.

False start #1 - Insufficient training data diversity

The first model version was trained on defect images collected under a single controlled lighting setup. When we deployed to the factory floor, the variation in ambient lighting between early-shift, mid-day, and late-shift conditions caused the model's false positive rate to spike to around 18% - enough to overwhelm the inspection team with spurious rejections and immediately undermine their trust in the system. We had to pause the rollout, re-collect training data across a full shift cycle on all eight lines, and retrain. That added six weeks and required annotating 40,000 additional images. The lesson: factory training data needs to be collected in the factory, across the full environmental range, before the model leaves development.

False start #2 - Single-camera inference per station

The original design used four cameras per station, not twelve. Four angles seemed sufficient during lab testing on representative parts. In production, we discovered that certain defect types - particularly shallow surface porosity on as-cast faces - were only reliably visible from a narrow range of oblique angles. Defects that the four-camera system missed were being caught by operators downstream who were increasingly frustrated. Adding the additional eight cameras per station required modifying the physical rig, re-running cable management, and extending the aggregation model. It was a material cost overrun against the original specification.

The threshold calibration problem

Every SKU has a different acceptable surface tolerance, and different OEM customers have different incoming quality requirements for the same part. Setting the rejection threshold too tight produced an unacceptably high false positive rate; too loose and the escape rate crept up. We ended up building a threshold management UI that let quality engineers adjust per-SKU, per-defect-class thresholds in real time, backed by a 30-day rolling look-back on false positive and escape rate estimates. That UI was not in the original scope but became one of the most-used parts of the system.

The results, nine months in

The system went fully live across all eight lines in Q1 2024. The figures below cover the nine months through to Q4 2024.

Metric

Before (baseline)

After (9 months live)

Defect escape rate

2.0%

0.03%

Daily units inspected

Manual sampling

60,000 (100% coverage)

Manual inspection headcount

24 FTE

4 FTE (exception handling)

False positive rate

N/A (manual)

0.4%

Average inspection time per unit

~8 seconds

<200ms

Recall events in period

2 (prior year)

OEM customer quality score (supplier)

Below threshold

Reinstated preferred status

0.03% Defect escape rate - down from 2%

↓ 98.5% improvement

100% Unit coverage - every part, every shift

↑ from spot-sampling only

0 Recall events in the nine months since go-live

↓ from 2 in the prior year

What happened to the 24-person inspection team

This question comes up in every conversation about projects like this, and it's worth addressing plainly.

The 24 manual inspectors were not let go. The supplier's quality team was already under-resourced relative to their growth targets - they had been turning away new OEM business partly because they couldn't guarantee the incoming quality levels those customers required. With the AI inspection system handling 100% screening, the quality team shifted to higher-value work: incoming supplier inspection, root cause analysis on the rejected-parts data the system now surfaces clearly, process improvement, and customer quality audits.

The 0.4% of parts that fail inspection and reach the exception-handling queue are the genuinely ambiguous cases - surface conditions that fall right on the edge of tolerance, new part numbers that haven't yet accumulated enough inference history, and the occasional false positive that requires a human decision. The team that previously spent its day doing what a camera and a model can do now spends its day doing the work that genuinely needs a trained eye and a quality engineering background.

Three things we'd do differently

Collect training data on the factory floor from day one. We knew this was important. We didn't act on it early enough. Any future vision project of this type should have factory-condition data collection scoped as a first-sprint deliverable, not a late-stage activity. The six-week delay we incurred when the initial model failed under real lighting conditions was entirely preventable.

Design the physical camera rig to be expandable. The jump from 4 to 12 cameras per station was painful precisely because the original rig wasn't designed with expansion in mind. For a project involving physical infrastructure in an active factory, building in mechanical and electrical headroom for changes costs almost nothing at design time and avoids expensive mid-project modifications.

Build the threshold management UI into the core scope. We treated it as a nice-to-have and ended up building it under pressure after go-live because without it, every threshold adjustment required a developer intervention. Quality engineers need to own their own thresholds. That capability should be in the system from the first production deployment.

Manufacturing defect detection at 60,000 units a day - with a 0.03% escape rate.

The situation before we got involved

Why this was harder than it looked

Part geometry variation across the product range

Line speed and latency constraints

Lighting and surface conditions

How the system works

What went wrong, and what we learned

False start #1 - Insufficient training data diversity

False start #2 - Single-camera inference per station

The threshold calibration problem

The results, nine months in

What happened to the 24-person inspection team

Three things we'd do differently

Have a production process where
human inspection is the quality bottleneck?

Manufacturing defect detection at 60,000 units a day - with a 0.03% escape rate.

The situation before we got involved

Why this was harder than it looked

Part geometry variation across the product range

Line speed and latency constraints

Lighting and surface conditions

How the system works

What went wrong, and what we learned

False start #1 - Insufficient training data diversity

False start #2 - Single-camera inference per station

The threshold calibration problem

The results, nine months in

What happened to the 24-person inspection team

Three things we'd do differently

Have a production process wherehuman inspection is the quality bottleneck?

Have a production process where
human inspection is the quality bottleneck?