feature engineering
Creating new input variables from raw data
βFeature engineering extracted meaningful signals from the timestamp data.β
Origin: From Latin `factura` (a making) + Old French `engin` (skill), from Latin `ingenium` (cleverness)
Loading collection...
Data science and analytics terminology
Creating new input variables from raw data
βFeature engineering extracted meaningful signals from the timestamp data.β
Origin: From Latin `factura` (a making) + Old French `engin` (skill), from Latin `ingenium` (cleverness)
Reducing the number of variables while preserving information
βDimensionality reduction made the dataset manageable for visualization.β
Origin: From Latin `dimensio` (a measuring), from `dimetiri` (to measure out)
Evaluating models by training on subsets and testing on the rest
βCross-validation revealed the model's true generalization performance.β
Origin: From Latin `crux` (cross) + `validus` (strong, effective), from `valere` (to be strong)
The proportion of positive predictions that are correct
βHigh precision means few false positives in our spam detection.β
Origin: From Latin `praecisio` (a cutting off), from `praecidere` (to cut off), from `prae-` (before) + `caedere` (to cut)
The proportion of actual positives correctly identified
βHigh recall ensures we catch most fraudulent transactions.β
Origin: From Latin `re-` (again, back) + `calare` (to call, summon)
The harmonic mean of precision and recall
βThe F1 score balances precision and recall into a single metric.β
Origin: Named `F1` as the first F-score or F-measure; `F` from F-measure, a weighted harmonic mean
A graph showing classifier performance at various thresholds
βThe ROC curve demonstrated excellent discrimination between classes.β
Origin: Acronym for `Receiver Operating Characteristic`, from signal detection theory in 1940s
Area Under the Curve, measuring overall model performance
βAn AUC of 0.95 indicates excellent predictive ability.β
Origin: Acronym from Latin `area` (open space) + Old English `under` + Latin `curvus` (bent)
A table showing prediction results versus actual values
βThe confusion matrix revealed the model confused cats with dogs.β
Origin: From Latin `confusio` (mixing together) + `matrix` (womb, breeding female), from `mater` (mother)
The balance between underfitting and overfitting
βUnderstanding the bias-variance tradeoff guides model complexity decisions.β
Origin: From French `biais` (slant) + Latin `variare` (to change) + Old English `tredan` (to tread) + `of` (away)
Techniques to prevent overfitting by penalizing complexity
βRegularization prevented the model from memorizing training data.β
Origin: From Latin `regula` (rule, straight piece of wood) + `-ization` suffix
Scaling data to a standard range
βNormalization ensured all features contributed equally to the model.β
Origin: From Latin `norma` (carpenter`s square, rule) + `-ization' suffix
Filling in missing data values
βMean imputation replaced missing values with column averages.β
Origin: From Latin `imputare` (to reckon, charge), from `in-` (in) + `putare` (to reckon, think)
Identifying data points that differ significantly from others
βOutlier detection flagged suspicious transactions for review.β
Origin: From `out` (Old English `ut`) + `lie` (Old English `licgan`) + Latin `detectio` (uncovering)
Grouping similar data points together
βClustering revealed three distinct customer segments.β
Origin: From Old English `cluster` (bunch, group), related to `clot`
Predicting which category a data point belongs to
βClassification determines whether an email is spam or legitimate.β
Origin: From Latin `classis` (class, division) + `facere` (to make)
Predicting a continuous numerical value
βRegression models forecast next quarter's sales figures.β
Origin: From Latin `regredi` (to go back), from `re-` (back) + `gradi` (to step, walk)
Data points indexed in time order
βTime series analysis detected seasonal patterns in demand.β
Origin: From Old English `tima` (time) + Latin `series` (row, chain), from `serere` (to join)
Identifying unusual patterns that don't conform to expected behavior
βAnomaly detection caught the security breach within minutes.β
Origin: From Greek `anomalia` (unevenness), from `an-` (not) + `homalos` (even)
Extract, Transform, Load - the data pipeline process
βThe ETL pipeline processes millions of records nightly.β
Origin: Acronym from Latin `extractus` (drawn out) + `transformare` (change shape) + Old English `hladan` (to load)
A sequence of processes
βThe data pipeline cleans and transforms the raw input.β
Origin: From `pipe` (Old English `pipe` from Latin `pipare` to chirp) + `line` (Latin `linea`)
The representation of an object, situation, or set of information as a chart or other image
βData visualization helps in identifying trends.β
Origin: From Latin `visualis` (of sight), from `visus` (sight), from `videre` (to see)
Explore other vocabulary categories in this collection.