Before bfPREP™: Looks Clean. Still Wrong.
Most biomedical datasets arrive in tidy rows and columns but structure is not meaning. Hidden inconsistencies in units, free text, missing derived variables, and fragmented terminology quietly introduce false heterogeneity. Models learn artifacts instead of biology. Subgrouping breaks. Results don’t reproduce across cohorts or sites. This is where “clean” data fails.
After bfPREP™: Standardized. Supplemented. Defensible.
bfPREP™ converts the same inputs into a comparable, feature-complete, and auditable dataset. Synonyms collapse into controlled values. Critical variables are derived explicitly. Ambiguity is flagged, not buried. Every transformation is documented, versioned, and reproducible. The result: data you can trust for modeling, discovery, and decision-making.
Before: Raw Data
After: Standardized
⚠️ Why ‘Clean’ Data Still Fails
- Free-text categories fragment (RUL vs right upper lobe vs lung apex)
- Units drift (lb vs kg, cm vs inches) with ambiguous bare numbers
- Regimens explode into strings (dose-reduced, +RT instead of canonical concepts)
- False heterogeneity dilutes signal and breaks subgrouping
- Models learn unit artifacts instead of biology
| Patient ID | Height | Weight | Lesion Location | Drug Regimen | Start | Progression |
|---|---|---|---|---|---|---|
| P001 | 5’10” | 180 | RUL | FOLFIRINOX | 2024-01-12 | 2024-07-03 |
| P002 | 178 cm | 82 kg | right upper lobe | folfiri-nox | 2024-02-01 | 2024-06-28 |
| P003 | 1.72 | 165 lb | Lung apex (R) | FOLFIRINOX (dose-reduced) | 2024-01-20 | 2024-08-15 |
| P004 | 70 in | 75 | upper lobe right | FOLFOX | 2024-03-10 | 2024-05-02 |
| P005 | 170cm | 68kg | liver | Gem/Abraxane | 2024-02-22 | 2024-09-30 |
| P006 | 5 ft 6 | 150lbs | hepatic | gemcitabine + nab-paclitaxel | 2024-01-05 | 2024-10-12 |
| P007 | 165 | 60 | “R. upper” | FOLFIRINOX | 2024-04-01 | 2024-03-15 |
| P008 | 1.80 m | 92kg | lung | FOLFIRINOX | 2024-01-25 | (blank) |
| P009 | 6’1 | 210 | RUL / mediastinum | FOLFIRINOX + RT | 2024-02-18 | 2024-06-01 |
| P010 | 185 cm | 95 | right lung UL | FOLFIRINOX | 2024-03-02 | 2024-07-29 |
New Columns: Derived variables (BMI, PFS, flags)
Standardized: Normalized units and controlled vocabularies
✅ What bfPREP™ Delivers
- Standardization: Lesion locations mapped to organ and subsite categories
- Normalization: Regimens normalized into primary + modifiers
- Supplementation: Derived columns (BMI, categories, time-to-event) added with explicit rules
- Quality Control: Censoring and validity flags included for modeling reliability
- Auditability: Each derived column has a manifest (inputs, method, version, coverage)
| Patient ID | Height (cm) | Weight (kg) | BMI | BMI Cat | Lesion Organ | Lesion Sub | Multifocal | Regimen | Modifiers | PFS (days) | PFS Valid | PFS Censored | Needs Review |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P001 | 177.8 | 81.6 | 25.8 | Overwt | lung | UL_right | false | FOLFIRINOX | none | 173 | true | false | false |
| P002 | 178.0 | 82.0 | 25.9 | Overwt | lung | UL_right | false | FOLFIRINOX | none | 148 | true | false | false |
| P003 | 172.0 | 74.8 | 25.3 | Overwt | lung | apex_right | false | FOLFIRINOX | dose_reduced | 208 | true | false | true* |
| P004 | 177.8 | 75.0 | 23.7 | Normal | lung | UL_right | false | FOLFOX | none | 53 | true | false | true* |
| P005 | 170.0 | 68.0 | 23.5 | Normal | liver | liver | false | GEM+NAB-PAC | none | 221 | true | false | false |
| P006 | 167.6 | 68.0 | 24.2 | Normal | liver | liver | false | GEM+NAB-PAC | none | 281 | true | false | false |
| P007 | 165.0 | 60.0 | 22.0 | Normal | lung | UL_right | false | FOLFIRINOX | none | -17 | false | false | true* |
| P008 | 180.0 | 92.0 | 28.4 | Overwt | lung | lung_unspecified | false | FOLFIRINOX | none | — | — | true | true* |
| P009 | 185.4 | 95.3 | 27.7 | Overwt | lung | UL_right | true | FOLFIRINOX | +RT | 104 | true | false | true* |
| P010 | 185.0 | 95.0 | 27.8 | Overwt | lung | UL_right | false | FOLFIRINOX | none | 149 | true | false | false |
* Needs Review indicates unit interference, ambiguous free text, or QC failures (for example: negative PFS in P007, missing progression date in P008)
Don’t debug your data after models fail.
Start with a focused bfPREP™ Data Prep Audit and get a clear, actionable view of what’s holding your data and decisions back.
Request your bfPREP™ Data Prep Audit
Understand the risk. Fix it early. Move forward with confidence.