{"id":25524626,"date":"2022-05-18T20:00:51","date_gmt":"2022-05-18T14:30:51","guid":{"rendered":"https:\/\/entri.app\/blog\/?p=25524626"},"modified":"2022-05-27T13:39:05","modified_gmt":"2022-05-27T08:09:05","slug":"introduction-to-statistics-for-machine-learning","status":"publish","type":"post","link":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/","title":{"rendered":"Introduction to Statistics for Machine Learning"},"content":{"rendered":"<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_79_2 counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<label for=\"ez-toc-cssicon-toggle-item-69e8f44a3e2d5\" class=\"ez-toc-cssicon-toggle-label\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/label><input type=\"checkbox\"  id=\"ez-toc-cssicon-toggle-item-69e8f44a3e2d5\"  aria-label=\"Toggle\" \/><nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#1_Why_R\" >1) Why R<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#2_Install_Packages_in_R\" >2) Install Packages in R<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#3_Data_Import_and_Manipulation\" >3) Data Import and Manipulation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#4_Basic_Math\" >4) Basic Math<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#5_Probability\" >5) Probability<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#6_Discrete_Random_Variables\" >6) Discrete Random Variables<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#7_Continuous_Random_Variables\" >7) Continuous Random Variables<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#8_Sampling_Distribution_of_Means\" >8) Sampling Distribution of Means<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#9_Central_Limit_Theorem_CLT\" >9) Central Limit Theorem (CLT)<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#10_CLT_in_Practice\" >10) CLT in Practice<\/a><\/li><\/ul><\/nav><\/div>\n<p>The first step to using statistics for machine learning and data science is to understand what statistics are, how they\u2019re used, and what their limitations are when it comes to machine learning and data science. In this article, we\u2019ll examine several important statistics concepts with an eye on how you can use them in your own work with machine learning and data science, and we\u2019ll also look at the benefits of each concept. <span data-slate-fragment=\"JTVCJTdCJTIydHlwZSUyMiUzQSUyMnBhcmFncmFwaCUyMiUyQyUyMmNoaWxkcmVuJTIyJTNBJTVCJTdCJTIydGV4dCUyMiUzQSUyMlN0YXRpc3RpY3MlMjBjYW4lMjBiZSUyMGludGltaWRhdGluZyUyQyUyMGJ1dCUyMHRoYXQlMjBkb2VzbiVFMiU4MCU5OXQlMjBtZWFuJTIweW91JTIwc2hvdWxkbiVFMiU4MCU5OXQlMjBsZWFybiUyMGl0LiUyMEluJTIwZmFjdCUyQyUyMHlvdSUyMHNob3VsZCUyMGxlYXJuJTIwaXQlMkMlMjBiZWNhdXNlJTIwd2l0aG91dCUyMHRoZSUyMGJhc2ljcyUyMG9mJTIwc3RhdGlzdGljcyUyQyUyMG1hY2hpbmUlMjBsZWFybmluZyUyMGFsZ29yaXRobXMlMjBjYW4lRTIlODAlOTl0JTIwaGVscCUyMHlvdSUyMG1ha2UlMjBiZXR0ZXIlMjBkZWNpc2lvbnMlMjBhYm91dCUyMHlvdXIlMjBwcm9kdWN0cyUyMGFuZCUyMHNlcnZpY2VzLiUyMEx1Y2tpbHklMkMlMjBzdGF0aXN0aWNzJTIwYXJlbiVFMiU4MCU5OXQlMjB0aGF0JTIwaGFyZCUyMG9uY2UlMjB5b3UlMjBnZXQlMjBwYXN0JTIwdGhlJTIwY29tcGxpY2F0ZWQlMjB0ZXJtaW5vbG9neSUyMGFuZCUyMGJlZ2luJTIwdG8lMjB1bmRlcnN0YW5kJTIwaG93JTIwdG8lMjBpbnRlcnByZXQlMjB0aGVtJTIwcHJvcGVybHkuJTIyJTdEJTVEJTdEJTVE\">Statistics can be intimidating, but that doesn\u2019t mean you shouldn\u2019t learn them. In fact, you should learn it, because, without the basics of statistics, <a href=\"https:\/\/entri.app\/blog\/artificial-intelligence-and-machine-learning-technologies-in-sports\/\" target=\"_blank\" rel=\"noopener\">machine learning<\/a> algorithms can\u2019t help you make better decisions about your products and services. Luckily, statistics aren\u2019t that hard once you get past the complicated terminology and begin to understand how to interpret them properly. statistics for machine learning is an important subject to master if you want to become an effective machine learning practitioner. Unfortunately, there\u2019s no shortcut; you\u2019ll need to devote significant time and effort in order to build your proficiency in this discipline\u2014but it will be worth it! Here are ten tips to get you started on the right foot.<\/span><\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">Get the latest updates on machine learning in the Entri app<\/a><\/p>\n<h2><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-25520997 size-full\" src=\"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/04\/Python-and-Machine-Learning-Rectangle.png\" alt=\"\" width=\"970\" height=\"250\" srcset=\"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/04\/Python-and-Machine-Learning-Rectangle.png 970w, https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/04\/Python-and-Machine-Learning-Rectangle-300x77.png 300w, https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/04\/Python-and-Machine-Learning-Rectangle-768x198.png 768w, https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/04\/Python-and-Machine-Learning-Rectangle-750x193.png 750w\" sizes=\"auto, (max-width: 970px) 100vw, 970px\" \/><\/a><\/h2>\n<h2><span class=\"ez-toc-section\" id=\"1_Why_R\"><\/span><strong>1) Why R<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Why learn R? Simply put, because it\u2019s not just good enough \u2014 it is one of the best statistical programming languages available today. It has many features that are missing in other tools (read more), and its active community of users is ever-growing. This creates a virtuous cycle where it keeps on getting better, and people keep on learning about it. If you don\u2019t believe me, just read up on what Google and LinkedIn have been saying lately. Even if you don\u2019t end up using R yourself, knowing how to interpret code in R will be a definite advantage if you encounter R-generated graphs or results from others. Plus, once you start working with data scientists, they\u2019ll expect you to know your way around R. So whether you choose to use R or not, knowing how to use it can only help your career prospects. That said, <a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">there are<\/a> many reasons why you might want to try out R. Some of them include: Free! No need to shell out thousands of dollars for proprietary software licenses or even expensive hardware when you already own a laptop computer. Powerful packages for advanced analytics Many R packages contain algorithms that rival those found in very expensive commercial software like SAS and SPSS . For example, check out caret, which contains advanced machine learning algorithms like random forests, boosting , and neural networks. And as I mentioned earlier, new packages are being added all the time by an extremely active user base who makes contributions through GitHub. You could also create your own package if there isn\u2019t one that does exactly what you need.<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">To know more about machine learning in the Entri app<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"2_Install_Packages_in_R\"><\/span><strong>2) Install Packages in R<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>R has a huge repository of packages that make it one of the most powerful languages in data science. In fact, there are over 17,000 packages available through CRAN. Using these packages will provide you with endless opportunities to dive deeper into R and increase your understanding. This is by no means an exhaustive list of every package out there but it\u2019s a great place to start. It will give you access to statistical models such as linear regression, clustering algorithms, time series forecasting and much more. To use these packages download them using install.packages(packagename) or install them from GitHub using devtools::install_github(user\/repo). Once installed load them by running library(packagename). You can find <a href=\"https:\/\/entri.app\/blog\/artificial-intelligence-and-machine-learning-technologies-in-sports\/\" target=\"_blank\" rel=\"noopener\">additional information<\/a> on how to use each package here. Below is a list of some essential packages for getting started. For further information about how to learn R check out The Ultimate Beginner&#8217;s Guide to Learning Data Science with R . 1. caret &#8211; Classification And REgression Training 2. ggplot2 &#8211; Create elegant data visualizations 3. dplyr &#8211; Fast, simple data manipulation 4. purrr &#8211; Easily chain together functions 5. tidyr &#8211; Easy data wrangling 6. stringr &#8211; String manipulation 7. lubridate &#8211; Date &amp; Time Manipulation 8. reshape2 &#8211; Flexible reshaping 9. foreign &#8211; Read in foreign formats (e.g., Stata, SAS) 10. Hmisc &#8211; Miscellaneous Functions 11. rpart &#8211; CART &amp; Tree-Based Models 12. MASS &#8211; General Purpose Toolkit 13. survival &#8211; Survival Analysis 14. plyr &#8211; Tools for Splitting, Applying and Combining Data 15. zoo &#8211; Work with &#8216;Safer&#8217; Versions of &#8216;Dangerous&#8217; Functions 16. DBI \u2013 Database Interface 17. pryr \u2013 Interact with Objects 18.<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">Enroll in our latest machine learning course in the Entri app<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"3_Data_Import_and_Manipulation\"><\/span><strong>3) Data Import and Manipulation<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Before you can start building machine learning models, you need your data in a format that your programming language can access. Ideally, that means storing it in a local file in comma-separated values (CSV) or similar format. In Python, you might also want to use a library like Pandas to help manipulate and tidy your data before it&#8217;s ready for modeling. If all you have is PDFs or text files, take a look at Tabula or other tools to get them into a usable format. If there are labels associated with your data\u2014like zip codes or census tract identifiers\u2014you&#8217;ll want those as well so that when you&#8217;re testing algorithms they get put into the right buckets and don&#8217;t drop any records. You may even want to create dummy variables for some of these fields if they aren&#8217;t already present in your dataset. For example, maybe you have an age variable but not one for the year of birth which could be useful information. The more information your algorithm has about each record, the better it will perform. So go ahead and add more! If you\u2019re new to <a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">machine learning<\/a>, chances are good that you\u2019ll find yourself going down a few blind alleys before finding something that works well enough to publish on Kaggle or deploy at scale. Here are some tips for building better models and avoiding some common pitfalls along the way.There&#8217;s no shortage of applications for machine learning algorithms\u2014from optimizing customer experiences to detecting fraud and predicting crop yields. But which ones can you tackle with just a laptop? Here we&#8217;ll take a look at four types of questions data scientists can answer with ML tools today, from basic statistical inference problems to more complex questions like image recognition and natural language processing (NLP). The data science field is growing fast, but it&#8217;s not always easy to break into without an advanced degree or years of work experience in statistics or computer science.<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">Start your coding preparation with Entri app<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"4_Basic_Math\"><\/span><strong>4) Basic Math<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>You can&#8217;t do statistics without basic math skills. Learn how to add, subtract, multiply and divide. Next, learn about exponents and logarithms. Finally, tackle ratios and proportions. Remember that you don&#8217;t need a calculator &#8211; know how to do them in your head! Practice regularly as it will get easier with time. Once you have these basics down, move on to fractions and percentages. These are more advanced concepts but are crucial for understanding other topics. If you&#8217;re feeling brave, try decimals and square roots next. These are extremely important concepts too! Don&#8217;t worry if any of these topics feel difficult at first; they&#8217;ll make sense eventually. Just keep practicing! It will take time, but all of these skills<a href=\"https:\/\/entri.app\/blog\/applications-of-machine-learning-in-healthcare-industry\/\" target=\"_blank\" rel=\"noopener\"> are necessary<\/a> for learning statistics. For more practice problems, check out Khan Academy&#8217;s Introduction to Statistics module. It&#8217;s completely free and does an excellent job explaining some complicated concepts like standard deviation. In addition to Khan Academy, another great resource is DataCamp. They offer interactive coding lessons that walk you through key statistical concepts. They also offer free trials so you can see what their platform is like before committing. The course above covers most of what you need to know to start using R, Python or SAS for data analysis. With a little bit of practice, you should be ready to start working on your own machine learning projects!<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">Get the latest updates on machine learning in the Entri app<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"5_Probability\"><\/span><strong>5) Probability<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Probability is very important in machine learning. The field itself is extremely mathematical, and so probability provides us with one of many tools that we can use when building models. In fact, you\u2019ll notice that every time we build a model, we have to make assumptions about data; whether or not those assumptions are true has a direct effect on our model&#8217;s performance. When dealing with probabilities, it\u2019s important that our assumptions are made as carefully as possible; if they aren\u2019t <a href=\"https:\/\/entri.app\/blog\/artificial-intelligence-and-machine-learning-technologies-in-sports\/\" target=\"_blank\" rel=\"noopener\">careful enough<\/a>, then it\u2019s possible that we could make an incorrect conclusion about what data tells us. For example, let\u2019s say that we want to determine whether or not a certain drug will work well for someone who has cancer. We run tests on people who have cancer and give them two different drugs (Drug A and Drug B). We keep track of how well each drug works by recording which people live after taking each drug (and which die). This is called survival analysis and it gives us a lot of information about how effective each drug is at treating cancer. But there are some problems with using survival analysis as a way to measure effectiveness\u2014the main problem being that there may be other factors involved besides just treatment type.<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">To know more about machine learning in the Entri app<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"6_Discrete_Random_Variables\"><\/span><strong>6) Discrete Random Variables<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>All variables are either discrete or continuous. Continuous variables, like height and weight, can take on an infinite number of values within a certain range. Discrete variables, on the other hand, only have a finite set of possible values; age is an example of a discrete variable because there are only so many ages that someone can be. Once you reach 100 years old, you aren\u2019t just magically older\u2014you\u2019re 101 years old! In machine learning and statistics , we often deal with discrete random variables. These are variables whose possible values have been predetermined. The number of children in a family, for instance, will always be one of two values (zero or one). This is different from continuous <a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">random variables<\/a> which can assume any value within some interval. The amount of money in your bank account could take any value between $0 and $1 million dollars. When working with discrete random variables, it\u2019s important to understand how they behave statistically . For instance, if you flip a coin 20 times what is the probability that exactly 15 heads will appear? What about exactly 8 heads? How about zero heads? To answer these questions, you need to know about probability distributions . A probability distribution is simply a function that maps every possible outcome of an experiment to its corresponding likelihood of occurring. You can use them to find probabilities associated with events by finding their location in relation to all other possibilities. With discrete random variables, each outcome has a specific name based on its position relative to every other possibility.<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">Get the latest updates on machine learning in the Entri app<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"7_Continuous_Random_Variables\"><\/span><strong>7) Continuous Random Variables<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>A random variable is a quantity that varies according to probability laws. The most common example of a random variable is an event (such as betting on coin toss or winning in gambling). A continuous random variable is one that can take any numerical value within a specified interval of values. These variables are often represented by graphs such as histograms, which helps you visualize how each individual result from your experiments will look. Continuous random variables are important because they allow you to use powerful probability distribution functions, including <a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">continuous distributions<\/a> and Gaussian distributions. To understand these, let\u2019s look at some examples of different types of data we might find in machine learning applications. For simplicity, we\u2019ll discuss two dimensions\u2014the independent and dependent variables\u2014and assume they exist on real-valued scales. In reality, many data sets are multi-dimensional. That means their measurements have multiple attributes; examples include high-dimensional images, audio recordings, text documents or even points in space representing 3D objects like a tumor under MRI scan.<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">To know more about machine learning in the Entri app<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"8_Sampling_Distribution_of_Means\"><\/span><strong>8) Sampling Distribution of Means<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>A sampling distribution is a theoretical distribution that represents many possible samples. In statistics, you&#8217;re likely to see five of them: sampling distributions of means, variances, proportions, ratios and differences between variances or means. To find out how all these distributions work, check out your new favorite site: Wikipedia. If you\u2019re having trouble getting comfortable with these types of math concepts (for many people, statistics can be confusing at first), try Googling empirical rule. It\u2019s an easy-to-follow way to understand probability ranges. Once you&#8217;ve got those down, it&#8217;s all just some fancy <a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">number crunching<\/a>\u2014or at least we hope it is. You&#8217;ll have to do some research on your own if you want to get more advanced than that. There are lots of great tutorials online that cover a variety of topics. Remember, though: The best thing you can do is practice! You don&#8217;t need expensive software to run experiments or play around with data sets; there are plenty of free tools available online. Just make sure you have good data so your results aren&#8217;t skewed by outliers! When in doubt, go back to what works best: keep reading and writing about stats until it starts making sense. Some more tutorials for machine learning beginners are given below (all in Python)<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">Get the latest updates on machine learning in the Entri app<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"9_Central_Limit_Theorem_CLT\"><\/span><strong>9) Central Limit Theorem (CLT)<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The CLT states that if a sample of values are drawn from a normal population with mean formula_7 and variance formula_8, then approximately 68% of them will be within formula_9 standard deviations from formula_10. That is, on average, approximately 68% of your samples will be within 3 standard deviations from formula_10. (This also works for other distributions.) In our case, we will use random noise drawn from a normal distribution to approximate that population. Then we can check if our solution is more than 3 standard deviations away from what was expected. If it is, we know it\u2019s unlikely to have <a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">been produced<\/a> by chance alone. This method won\u2019t tell us exactly how far off our value is\u2014it\u2019s just a test to see if something seems too unlikely or not. We could run hundreds or thousands of tests and get many different results\u2014some above and some below three standard deviations\u2014but as long as they cluster around that line, we should feel confident in saying that it probably wasn&#8217;t produced by chance alone. For example, imagine you wanted to test whether someone was lying about their age. You ask them how old they are and they say 29. You might think that&#8217;s suspicious because most people would lie about being younger rather than older, so you decide to do a test by asking 100 people at random what their ages are. Of those 100 people, only two say 29 years old\u2014that&#8217;s two standard deviations from your mean of 30 years old. So if you thought all those answers were equally likely to occur by chance alone, there&#8217;s only a 2% chance that both would fall outside of three standard deviations from 30 years old\u2014so maybe it&#8217;s worth thinking twice before accusing someone of lying about their age!<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">To know more about machine learning in the Entri app<\/a><\/p>\n<h2><span class=\"ez-toc-section\" id=\"10_CLT_in_Practice\"><\/span><strong>10) CLT in Practice<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>One of my favorite benefits of regression models is something called the least absolute shrinkage and selection operator (LASSO). LASSO is one of many ways you can control overfitting in your machine learning model, but it&#8217;s also important from a statistical perspective. LASSO essentially means that when you fit your model, instead of trying to find parameters that minimize errors as much as possible, you minimize errors relative to other parameters (or features) in <a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">your dataset<\/a>. In simple terms, if two features are correlated with each other in your data (they have a high correlation coefficient), fitting both might overfit and not give accurate predictions. LASSO helps limit overfitting by only keeping those correlations that are truly necessary. That being said, there are many techniques out there for controlling overfitting\u2014including regularization and cross-validation\u2014and LASSO isn&#8217;t always best suited for every situation. I&#8217;d encourage you to read more about these techniques here before deciding which method is best for your situation. If you are interested to learn new coding skills, the Entri app will help you to acquire them very easily. Entri app is following a structural study plan so that the students can learn very easily. If you don&#8217;t have a coding background, it won&#8217;t be any problem. You can download the Entri app from the google play store and enroll in your favorite course.<\/p>\n<p style=\"text-align: center\"><a href=\"https:\/\/entri.sng.link\/Bcofz\/w75k\/zvbw\">Get the latest updates on machine learning in the Entri app<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The first step to using statistics for machine learning and data science is to understand what statistics are, how they\u2019re used, and what their limitations are when it comes to machine learning and data science. In this article, we\u2019ll examine several important statistics concepts with an eye on how you can use them in your [&hellip;]<\/p>\n","protected":false},"author":93,"featured_media":25524633,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[802,1864],"tags":[],"class_list":["post-25524626","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-articles","category-data-science-ml"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Introduction to Statistics for Machine Learning - Entri Blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Introduction to Statistics for Machine Learning - Entri Blog\" \/>\n<meta property=\"og:description\" content=\"The first step to using statistics for machine learning and data science is to understand what statistics are, how they\u2019re used, and what their limitations are when it comes to machine learning and data science. In this article, we\u2019ll examine several important statistics concepts with an eye on how you can use them in your [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Entri Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/entri.me\/\" \/>\n<meta property=\"article:published_time\" content=\"2022-05-18T14:30:51+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2022-05-27T08:09:05+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/05\/Untitled-40-1.png\" \/>\n\t<meta property=\"og:image:width\" content=\"820\" \/>\n\t<meta property=\"og:image:height\" content=\"615\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Akhil M G\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@entri_app\" \/>\n<meta name=\"twitter:site\" content=\"@entri_app\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Akhil M G\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/\"},\"author\":{\"name\":\"Akhil M G\",\"@id\":\"https:\/\/entri.app\/blog\/#\/schema\/person\/875646423b2cce93c1bd5bc16850fff6\"},\"headline\":\"Introduction to Statistics for Machine Learning\",\"datePublished\":\"2022-05-18T14:30:51+00:00\",\"dateModified\":\"2022-05-27T08:09:05+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/\"},\"wordCount\":2848,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/entri.app\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/05\/Untitled-40-1.png\",\"articleSection\":[\"Articles\",\"Data Science and Machine Learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/\",\"url\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/\",\"name\":\"Introduction to Statistics for Machine Learning - Entri Blog\",\"isPartOf\":{\"@id\":\"https:\/\/entri.app\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/05\/Untitled-40-1.png\",\"datePublished\":\"2022-05-18T14:30:51+00:00\",\"dateModified\":\"2022-05-27T08:09:05+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#primaryimage\",\"url\":\"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/05\/Untitled-40-1.png\",\"contentUrl\":\"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/05\/Untitled-40-1.png\",\"width\":820,\"height\":615,\"caption\":\"Introduction to Statistics for Machine Learning\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/entri.app\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Entri Skilling\",\"item\":\"https:\/\/entri.app\/blog\/category\/entri-skilling\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Data Science and Machine Learning\",\"item\":\"https:\/\/entri.app\/blog\/category\/entri-skilling\/data-science-ml\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Introduction to Statistics for Machine Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/entri.app\/blog\/#website\",\"url\":\"https:\/\/entri.app\/blog\/\",\"name\":\"Entri Blog\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\/\/entri.app\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/entri.app\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/entri.app\/blog\/#organization\",\"name\":\"Entri App\",\"url\":\"https:\/\/entri.app\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/entri.app\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/entri.app\/blog\/wp-content\/uploads\/2019\/10\/Entri-Logo-1.png\",\"contentUrl\":\"https:\/\/entri.app\/blog\/wp-content\/uploads\/2019\/10\/Entri-Logo-1.png\",\"width\":989,\"height\":446,\"caption\":\"Entri App\"},\"image\":{\"@id\":\"https:\/\/entri.app\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/entri.me\/\",\"https:\/\/x.com\/entri_app\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/entri.app\/blog\/#\/schema\/person\/875646423b2cce93c1bd5bc16850fff6\",\"name\":\"Akhil M G\",\"url\":\"https:\/\/entri.app\/blog\/author\/akhil\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Introduction to Statistics for Machine Learning - Entri Blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/","og_locale":"en_US","og_type":"article","og_title":"Introduction to Statistics for Machine Learning - Entri Blog","og_description":"The first step to using statistics for machine learning and data science is to understand what statistics are, how they\u2019re used, and what their limitations are when it comes to machine learning and data science. In this article, we\u2019ll examine several important statistics concepts with an eye on how you can use them in your [&hellip;]","og_url":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/","og_site_name":"Entri Blog","article_publisher":"https:\/\/www.facebook.com\/entri.me\/","article_published_time":"2022-05-18T14:30:51+00:00","article_modified_time":"2022-05-27T08:09:05+00:00","og_image":[{"width":820,"height":615,"url":"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/05\/Untitled-40-1.png","type":"image\/png"}],"author":"Akhil M G","twitter_card":"summary_large_image","twitter_creator":"@entri_app","twitter_site":"@entri_app","twitter_misc":{"Written by":"Akhil M G","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#article","isPartOf":{"@id":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/"},"author":{"name":"Akhil M G","@id":"https:\/\/entri.app\/blog\/#\/schema\/person\/875646423b2cce93c1bd5bc16850fff6"},"headline":"Introduction to Statistics for Machine Learning","datePublished":"2022-05-18T14:30:51+00:00","dateModified":"2022-05-27T08:09:05+00:00","mainEntityOfPage":{"@id":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/"},"wordCount":2848,"commentCount":0,"publisher":{"@id":"https:\/\/entri.app\/blog\/#organization"},"image":{"@id":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/05\/Untitled-40-1.png","articleSection":["Articles","Data Science and Machine Learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/","url":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/","name":"Introduction to Statistics for Machine Learning - Entri Blog","isPartOf":{"@id":"https:\/\/entri.app\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#primaryimage"},"image":{"@id":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/05\/Untitled-40-1.png","datePublished":"2022-05-18T14:30:51+00:00","dateModified":"2022-05-27T08:09:05+00:00","breadcrumb":{"@id":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#primaryimage","url":"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/05\/Untitled-40-1.png","contentUrl":"https:\/\/entri.app\/blog\/wp-content\/uploads\/2022\/05\/Untitled-40-1.png","width":820,"height":615,"caption":"Introduction to Statistics for Machine Learning"},{"@type":"BreadcrumbList","@id":"https:\/\/entri.app\/blog\/introduction-to-statistics-for-machine-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/entri.app\/blog\/"},{"@type":"ListItem","position":2,"name":"Entri Skilling","item":"https:\/\/entri.app\/blog\/category\/entri-skilling\/"},{"@type":"ListItem","position":3,"name":"Data Science and Machine Learning","item":"https:\/\/entri.app\/blog\/category\/entri-skilling\/data-science-ml\/"},{"@type":"ListItem","position":4,"name":"Introduction to Statistics for Machine Learning"}]},{"@type":"WebSite","@id":"https:\/\/entri.app\/blog\/#website","url":"https:\/\/entri.app\/blog\/","name":"Entri Blog","description":"","publisher":{"@id":"https:\/\/entri.app\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/entri.app\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/entri.app\/blog\/#organization","name":"Entri App","url":"https:\/\/entri.app\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/entri.app\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/entri.app\/blog\/wp-content\/uploads\/2019\/10\/Entri-Logo-1.png","contentUrl":"https:\/\/entri.app\/blog\/wp-content\/uploads\/2019\/10\/Entri-Logo-1.png","width":989,"height":446,"caption":"Entri App"},"image":{"@id":"https:\/\/entri.app\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/entri.me\/","https:\/\/x.com\/entri_app"]},{"@type":"Person","@id":"https:\/\/entri.app\/blog\/#\/schema\/person\/875646423b2cce93c1bd5bc16850fff6","name":"Akhil M G","url":"https:\/\/entri.app\/blog\/author\/akhil\/"}]}},"_links":{"self":[{"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/posts\/25524626","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/users\/93"}],"replies":[{"embeddable":true,"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/comments?post=25524626"}],"version-history":[{"count":3,"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/posts\/25524626\/revisions"}],"predecessor-version":[{"id":25525724,"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/posts\/25524626\/revisions\/25525724"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/media\/25524633"}],"wp:attachment":[{"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/media?parent=25524626"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/categories?post=25524626"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/entri.app\/blog\/wp-json\/wp\/v2\/tags?post=25524626"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}