The situation before we got involved
Automotive manufacturing doesn't tolerate defects. A single faulty component reaching an assembly line downstream can halt production, trigger a warranty claim, or - in the worst case - end up in a recall that costs far more than the part was ever worth. Quality control at the stamping, casting, and machined-parts level is therefore not optional; the question is only how well it works.
For this Tier 1 supplier, that meant a team of visual inspectors stationed at the end of each production line, eyeballing roughly 60,000 units a day across eight lines. The process was manual, shift-dependent, and fatigued by hour four of a nine-hour run. The supplier's defect escape rate - the percentage of genuinely defective units that left the facility undetected - sat at around 2%. At production volume, that translated to 1,200 defective parts shipped every day.
Their OEM customers had started flagging it. Two recall events in eighteen months had damaged the relationship with their largest account. The plant manager knew the manual inspection process was the root cause and had already explored off-the-shelf machine vision systems, but nothing he'd seen could handle the variation in part geometry across their product range.
"We weren't looking for a system that caught 90% of defects. Our OEM customers were already raising flags. We needed something that could genuinely replace a human inspector - across every part type, every shift, every line."
Why this was harder than it looked
Automated visual inspection sounds like a solved problem. Off-the-shelf systems exist. The challenge here was in the specifics - three things that ruled out every vendor solution we evaluated before committing to a custom build.
Part geometry variation across the product range
The supplier manufactured over 340 distinct part numbers across the eight lines. Each part has a different surface profile, reflectance characteristic, and set of acceptable defect tolerances. A surface scratch acceptable on a structural bracket is a rejection on a visible-face trim component. Any inspection system had to handle that variation in real time - not with a different setup for every SKU, but dynamically, based on what was running on the line at any given moment.
Line speed and latency constraints
Parts move through the inspection station at line speed. The system had a window of under 200 milliseconds per part to capture, process, and issue a rejection signal before the part moved past the pneumatic ejector mechanism. That is not enough time to send image data off-site, run inference in the cloud, and receive a result. Everything had to run at the edge.
Lighting and surface conditions
Stamped and cast metal parts have surfaces that are challenging for vision systems. Oil films from forming lubricants, heat-induced discolouration, and the transition between machined and as-cast faces all create false positive patterns that confuse models trained on clean training data. Getting the false positive rate low enough to avoid operator alert fatigue - while keeping the true defect detection rate high - required careful work on both the imaging setup and the training dataset.
How the system works
The production architecture runs across five stages, each refined significantly from the original specification as we encountered real factory conditions.
What went wrong, and what we learned
We had two significant setbacks during the build. Documenting them honestly is more useful than a sanitised version of events.
False start #1 - Insufficient training data diversity
The first model version was trained on defect images collected under a single controlled lighting setup. When we deployed to the factory floor, the variation in ambient lighting between early-shift, mid-day, and late-shift conditions caused the model's false positive rate to spike to around 18% - enough to overwhelm the inspection team with spurious rejections and immediately undermine their trust in the system. We had to pause the rollout, re-collect training data across a full shift cycle on all eight lines, and retrain. That added six weeks and required annotating 40,000 additional images. The lesson: factory training data needs to be collected in the factory, across the full environmental range, before the model leaves development.
False start #2 - Single-camera inference per station
The original design used four cameras per station, not twelve. Four angles seemed sufficient during lab testing on representative parts. In production, we discovered that certain defect types - particularly shallow surface porosity on as-cast faces - were only reliably visible from a narrow range of oblique angles. Defects that the four-camera system missed were being caught by operators downstream who were increasingly frustrated. Adding the additional eight cameras per station required modifying the physical rig, re-running cable management, and extending the aggregation model. It was a material cost overrun against the original specification.
The threshold calibration problem
Every SKU has a different acceptable surface tolerance, and different OEM customers have different incoming quality requirements for the same part. Setting the rejection threshold too tight produced an unacceptably high false positive rate; too loose and the escape rate crept up. We ended up building a threshold management UI that let quality engineers adjust per-SKU, per-defect-class thresholds in real time, backed by a 30-day rolling look-back on false positive and escape rate estimates. That UI was not in the original scope but became one of the most-used parts of the system.
The results, nine months in
The system went fully live across all eight lines in Q1 2024. The figures below cover the nine months through to Q4 2024.
What happened to the 24-person inspection team
This question comes up in every conversation about projects like this, and it's worth addressing plainly.
The 24 manual inspectors were not let go. The supplier's quality team was already under-resourced relative to their growth targets - they had been turning away new OEM business partly because they couldn't guarantee the incoming quality levels those customers required. With the AI inspection system handling 100% screening, the quality team shifted to higher-value work: incoming supplier inspection, root cause analysis on the rejected-parts data the system now surfaces clearly, process improvement, and customer quality audits.
The 0.4% of parts that fail inspection and reach the exception-handling queue are the genuinely ambiguous cases - surface conditions that fall right on the edge of tolerance, new part numbers that haven't yet accumulated enough inference history, and the occasional false positive that requires a human decision. The team that previously spent its day doing what a camera and a model can do now spends its day doing the work that genuinely needs a trained eye and a quality engineering background.
Three things we'd do differently
Collect training data on the factory floor from day one. We knew this was important. We didn't act on it early enough. Any future vision project of this type should have factory-condition data collection scoped as a first-sprint deliverable, not a late-stage activity. The six-week delay we incurred when the initial model failed under real lighting conditions was entirely preventable.
Design the physical camera rig to be expandable. The jump from 4 to 12 cameras per station was painful precisely because the original rig wasn't designed with expansion in mind. For a project involving physical infrastructure in an active factory, building in mechanical and electrical headroom for changes costs almost nothing at design time and avoids expensive mid-project modifications.
Build the threshold management UI into the core scope. We treated it as a nice-to-have and ended up building it under pressure after go-live because without it, every threshold adjustment required a developer intervention. Quality engineers need to own their own thresholds. That capability should be in the system from the first production deployment.