Threat To Validity for Your Data Analytics Projects

•Internal

•External

•Construct

•Statistical Conclusion

Internal: Informative variable missing. Bring data from other sources

External: Fixation variable make the result perfect. Model may not generalize

Construct: Class imbalance affects outcome badly

Statistical Conclusion: Based on the statistical measure used, the conclusion can be incorrect.

•Data Mining: Association: Support, Confidence, and Lift

Internal Validity

Is your experiment (and Model) Internally Valid?

What is the Threat that

the experiment (model, and outcome) is invalid (internally)?)

Example: Reasons that inferences between two variables are causal are incorrect. [b]

Cause: Lack of informative variables

Solution: Bring data from other sources

External Validity

Is your experiment (and Model) Externally Valid?

What is the Threat to external Validity that the experiment (model, and outcome) is externally invalid?)

“Study results may not apply to other groups.”

Cause: Fixation Variable

Solution: exclude fixation variable from the study

Ref: https://en.wikipedia.org/wiki/External_validity

Construct Validity

Is your experiment (and Model) Valid by Construction?

What is the Threat that  the experiment (model, and outcome) is invalid by Construction?)

Example: in Classification if the data is imbalanced,

Variables’ effect on the outcome can be invalid

Cause: Construction/balance problem

Solution: Treat Data for Imbalance

Statistical Conclusion Validity

Is your conclusion (from the experiment and the Model) Statistically Valid, even done by Statistical Analysis?

What is the Threat that  the conclusion (from the experiment and the Model) is invalid?)

Example: In data mining, you just considered Association. But that does not give the full picture

Solution: Include Support, Confidence, and Lift

Ref: https://www.analyticsvidhya.com/

Data Analytics, Machine Learning.

Data Analytics, Machine Learning, Data Science