From Figure 3, you can see that the model forecasts that this person would have an income of 24,494 (cell J22) as calculated by the formula =TREND(J4:J19,F4:I19,F22:I22). The model can predict the income of a 25-year-old woman who is a Democrat, provided you recognize that the coding is Age = 25, Gender1 = 0, Party1 = 0, Party2 = 1. The output from the Real Statistics Linear Regression data analysis tool on this input is shown in Figure 2.įigure 2 – Regression with categorical data We can now perform regression analysis on this range. The resulting coding is as shown in range F3:J19 of Figure 1. Party2 = 1 if Party is Dem and Party2 = 0 otherwise Party1 = 1 if Party is Rep and Party1 = 0 otherwise Since Party takes three values (Rep, Dem, Ind), two dummy variables, called Party1 and Party2, are needed to code Party, defined as follows: Gender1 = 1 if Gender is Male and Gender1 = 0 otherwise (i.e. Since Gender takes two values (Male and Female), one dummy variable, called Gender1, is sufficient to code Gender, defined as follows: In general, if the original data has k categorical values, the model will require k – 1 dummy variables. ![]() There are three possible values for the Party affiliation variable and two possible values for the Gender. ExampleĮxample 1: Create a regression model for the data in range A3:D19 of Figure 1.Īge is a continuous random variable, while Party affiliation and Gender are categorical random variables. Each such dummy variable will only take the value 0 or 1 (although in ANOVA using Regression, we describe an alternative coding that takes values 0, 1, or -1). ![]() Categorical independent variables can be used in a regression analysis, but first, they need to be coded by one or more dummy variables (also called tag variables).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |