bfPREP Before and After Example

Before bfPREP™: Looks Clean. Still Wrong.

Most biomedical datasets arrive in tidy rows and columns but structure is not meaning. Hidden inconsistencies in units, free text, missing derived variables, and fragmented terminology quietly introduce false heterogeneity. Models learn artifacts instead of biology. Subgrouping breaks. Results don’t reproduce across cohorts or sites. This is where “clean” data fails.

After bfPREP™: Standardized. Supplemented. Defensible.

bfPREP™ converts the same inputs into a comparable, feature-complete, and auditable dataset. Synonyms collapse into controlled values. Critical variables are derived explicitly. Ambiguity is flagged, not buried. Every transformation is documented, versioned, and reproducible. The result: data you can trust for modeling, discovery, and decision-making.

bfPREP Dataset Comparison

Before: Raw Data

After: Standardized

⚠️ Why ‘Clean’ Data Still Fails

Free-text categories fragment (RUL vs right upper lobe vs lung apex)
Units drift (lb vs kg, cm vs inches) with ambiguous bare numbers
Regimens explode into strings (dose-reduced, +RT instead of canonical concepts)
False heterogeneity dilutes signal and breaks subgrouping
Models learn unit artifacts instead of biology

Patient ID	Height	Weight	Lesion Location	Drug Regimen	Start	Progression
P001	5’10”	180	RUL	FOLFIRINOX	2024-01-12	2024-07-03
P002	178 cm	82 kg	right upper lobe	folfiri-nox	2024-02-01	2024-06-28
P003	1.72	165 lb	Lung apex (R)	FOLFIRINOX (dose-reduced)	2024-01-20	2024-08-15
P004	70 in	75	upper lobe right	FOLFOX	2024-03-10	2024-05-02
P005	170cm	68kg	liver	Gem/Abraxane	2024-02-22	2024-09-30
P006	5 ft 6	150lbs	hepatic	gemcitabine + nab-paclitaxel	2024-01-05	2024-10-12
P007	165	60	“R. upper”	FOLFIRINOX	2024-04-01	2024-03-15
P008	1.80 m	92kg	lung	FOLFIRINOX	2024-01-25	(blank)
P009	6’1	210	RUL / mediastinum	FOLFIRINOX + RT	2024-02-18	2024-06-01
P010	185 cm	95	right lung UL	FOLFIRINOX	2024-03-02	2024-07-29

New Columns: Derived variables (BMI, PFS, flags)

Standardized: Normalized units and controlled vocabularies

✅ What bfPREP™ Delivers

Standardization: Lesion locations mapped to organ and subsite categories
Normalization: Regimens normalized into primary + modifiers
Supplementation: Derived columns (BMI, categories, time-to-event) added with explicit rules
Quality Control: Censoring and validity flags included for modeling reliability
Auditability: Each derived column has a manifest (inputs, method, version, coverage)

Patient ID	Height (cm)	Weight (kg)	BMI	BMI Cat	Lesion Organ	Lesion Sub	Multifocal	Regimen	Modifiers	PFS (days)	PFS Valid	PFS Censored	Needs Review
P001	177.8	81.6	25.8	Overwt	lung	UL_right	false	FOLFIRINOX	none	173	true	false	false
P002	178.0	82.0	25.9	Overwt	lung	UL_right	false	FOLFIRINOX	none	148	true	false	false
P003	172.0	74.8	25.3	Overwt	lung	apex_right	false	FOLFIRINOX	dose_reduced	208	true	false	true*
P004	177.8	75.0	23.7	Normal	lung	UL_right	false	FOLFOX	none	53	true	false	true*
P005	170.0	68.0	23.5	Normal	liver	liver	false	GEM+NAB-PAC	none	221	true	false	false
P006	167.6	68.0	24.2	Normal	liver	liver	false	GEM+NAB-PAC	none	281	true	false	false
P007	165.0	60.0	22.0	Normal	lung	UL_right	false	FOLFIRINOX	none	-17	false	false	true*
P008	180.0	92.0	28.4	Overwt	lung	lung_unspecified	false	FOLFIRINOX	none	—	—	true	true*
P009	185.4	95.3	27.7	Overwt	lung	UL_right	true	FOLFIRINOX	+RT	104	true	false	true*
P010	185.0	95.0	27.8	Overwt	lung	UL_right	false	FOLFIRINOX	none	149	true	false	false

* Needs Review indicates unit interference, ambiguous free text, or QC failures (for example: negative PFS in P007, missing progression date in P008)

Don’t debug your data after models fail.
Start with a focused bfPREP™ Data Prep Audit and get a clear, actionable view of what’s holding your data and decisions back.

Request your bfPREP™ Data Prep Audit
Understand the risk. Fix it early. Move forward with confidence.

bfPREP(TM) - Before and After

Your Data Looks Clean. That’s Usually the Problem.

Before bfPREP™: Looks Clean. Still Wrong.

After bfPREP™: Standardized. Supplemented. Defensible.

⚠️ Why ‘Clean’ Data Still Fails

✅ What bfPREP™ Delivers

Leap into the future of AI-powered drug development with BullFrog AI.