Dimensionless weighted standard deviation good

#Dimensionless weighted standard deviation good full#

So the third number to take a look at will be standard deviation. If you could choose another number, that could be median which is also 50.īut as I said, you want to understand the spread of your dataset, too. If you could choose only one number (and if you think like most data scientists) this number will be the mean - which is 50 in this case. Let’s take this small, one-dimensional dataset: ĭescribe this dataset the best - using the fewest numbers! Let me show you mine! Describing a data set with a few values Now, of course, there are no golden rules about what exact metrics you should use… But there are some best practices. Remember, statistics is about “compressing” a lot of information into a few numbers so our brain can process it more easily.

#Dimensionless weighted standard deviation good full#

These statistical measures won’t show you the full complexity of your data but you’ll get a good grasp and a basic understanding of what it looks like. Similarly, in statistics when you start to make friends with your data, you use some frequently used metrics to get to know it: mean, median, standard deviation, percentiles, etc.

In real life, when you meet a new person and start getting to know her, you use a few common formulas ( “how are you”, “nice to meet you”, “what’s your name”, “what do you do”). For a human, millions of data points are too many to interpret, understand or remember. If you have millions (or even billions) of data points, you won’t have time to go through everything line by line. Looking at your data for the first time, the catch is always the same. I use statistical measures quite often in the data discovery phase of my data projects.

Let’s dive in! The role of statistics in data discovery There are more - but I use these in real data science projects the most often. In this article, I’ll show you my three favorite ways to discover the spread of your data. That’s where statistical variability comes into play. But that’s not true: the second one has much more spread, right? And in data science that is an important difference. …by looking only at these values, you could say that the two datasets are very similar to each other. That’s well and good… But there is a problem with statistical averages: they don’t tell you too much about the statistical variability (or in other words the spread or dispersion) of your data.Į.g. In my previous article about statistical averages, we discussed how you can describe your dataset with a few central values (mean, median and mode).