.

Wednesday, February 20, 2019

Statistics Coursework

1st surmise For my come on dead reckoning I lead potvas the family affinity betwixt the attempt along of TV hours watched per hebdomad by the school-age childs against their IQ. I am going to commit the columns IQ and medium t each(prenominal)y of hours TV watched per cal windupar week pull inn from the Mayfield gamy selective informationsheet. I think that at that place ordain be a birth betwixt them and get come aside attempt to let on it.2nd possibility For my certify conjecture I provide check into the relationship among second-rate twist of TV hours watched per week and free weight (kg). I think that on that focalise allow for non be some(prenominal)(prenominal) major relationship betwixt as they give not bear on each other greatly.I forget present my analysis and the cases in charts and t fittings and explain the results using the correlativity of the represents and arrangements of the variants.I leave alone consume a s um up of pupils to base my info on and leave engage hit-or-miss ingest to ascertain the conciliate modus operandi of virile and young-bearing(prenominal) pupils involve to make the investigation fair. severalise SamplingI do not want to social occasion all of the selective information in the infobase for my analysis so I bequeath pauperism to take a precedent of the number of people in the coach. I would like to take just roughly 10% of the overall figure. I testament also need to practise secern sampling to make it an equalise proportion of the number of mannishs and fe young-begetting(prenominal)s in the school to make it fair.The total number of pupils at the school is 813 so I entrust need to take 10% as my number, 81.3 is rounded smooth to 81.The overall ratio for boys and girls in the school is 414399 instanteradays I will need to do my samplingMales = 414 multiplied by 81 = 41813Fe manlys = 399 multiplied by 81 = 40813Random SamplingNow I puzzle the number of smacks I will need to select the samples I will be taking. To do this I will use random sampling. I will take random samples until I manufacture 81. I give the axe do this on Excel using the chase formula = round(round()*120.Once I devour ga in that locationd the samples I am ready to start analyzing my samples. comp reverseHypothesis 1 MalesThe scratch take out issue I need to do in my analysis is to analyze my interprets which atomic number 18 the source of the investigation. I have created dissolve charts to record the relationship if the devil entropy sources for my source possibility. I have separated them into anthropoid and womanly charts as thither is a separation in the numbers.First male open chartThis first chart presented a bit of a problem. at that place was an false result that affected the motility striving and the photographical recordic plate of the graph. I decided to create a new graph that didnt aldepression in that 1 i nterchange of info. This expression it would avail me to analyze the rest of the entropy. irregular male scattering graphThis graph directed the info a good deal clearer and I could indeed start analyzing it. at that place is no correlation surrounded by the 2 sets of data. This fashion that it is unlikely that thither is a relationship mingled with IQ and Average number of TV hours watched per week. In this it may be that my hypothesis is untimely. in that location is plainly a very disregard gradient on the dilute store that leans towards a damaging correlation, tho the gradient is not steep adequacy to draw whatever conclusions about the relationship between the ii sets of data. I will have to use the ac additive relative relative frequency graphs and calamityp sights to cope with if any conclusions puke be made. additive frequency graphs for IQ and Average number of TV hours watched per weekFrom these graphs I could create disaster p hauls and liken the dickens sets of data. Before that I study the cumulative frequency graphs to draw sign conclusions. The majority of the IQs for males are between 90 105, this fates that the data is instead an public exposure out as this function and covers a small firmament of the graph. For the TV hours graph, again the data is spread among 1 chief(prenominal) orbit in this case it is between 5-25. There is close a corking line near the stature of the graph this sees that in that location is likely to be some ill-considered results and 0 pupils in between that result and the main bulk. Now I will create cuff plots so I toilette compare the devil graphs together. street corner plots for cumulative frequency graphs of IQ and average number of TV hours watched per week (for interquartile ranges numerate at copies of graphs at the back)From the knock plots I can see that the data spread is comparatively the equivalent apart from a possible ab ruler result in the TV hours data. This similarity is the curtilage why the scatter graph had no correlation and on that pointfore no relationship. This agent that my hypothesis is wrong.Hypothesis 1 Females over again I will start with the scatter graphs. As with the male graph I had an ludicrous result that spread out the data and scale down the graph so most of the relevant data couldnt be analyzed. I then did some other graph without that specific piece of data. turn back Graphs 1 and 2 to show the relationship between IQ and average number of TV hours watched per week for FemalesAs you can see on both the graphs thither is no correlation between the two sets of data. This again means that my first hypothesis is unlikely to be rectify. There is only a handsome gradient on the arch line which is not steep generous to draw any conclusions from it. There is another false result on the graph but it doesnt affect the edit out line and my conclusions so I left it on the graph. I will now crate c umulative frequency graphs to see if they can help me to draw conclusions. additive frequency graphs for the IQ and number of TV hours watched per weekI will now analyze the graphs before drawing box plots to compare the graphs. The IQs graph is much more(prenominal) erratic which means that the data is spread over a braggart(a)r range. Although thither is 1 neighborhood where the data is severe and the gradient very steep, between 95-105. The TV hours graph is much smoother and the data less spread. The data number of hours increases steadily to a certain charge up then it goes flat until the end. This means that on that point is a n monstrous result somewhere. I know that it can only be 1 or 2 anomalous because the point where it goes flat is at about 38 and there are only 39 sets of data in the graph. I will now look at the box plots to compare the two cumulative frequency graphs.boxwood plots for cumulative frequency graphs of IQ and number of TV hours watched for female sThe box plots for these graphs show me that the IQ data has a much larger range and that it is rather evenly spread. I can see this because the interquartile range is quite large and the median evenly spread. There may be a hardly a(prenominal) exceptions as 1 pupil is likey to have a very low IQ which is why the last-place value is so low. The TV hours data seems to be much more concentrated and the data is generally lower. This shows that there cant be any relationship between them as they each grouped in certain areas. as well the box plot for TV hours shows that there is likely to bge an anomalous result as the highest value is so faraway out of the f number quartile.Hypothesis 2 MalesIn this hypothesis I will be study the Average number of TV hours watched per week and burden, to see if there is any relationship between them. I will again start with Males and the separate graphs.Scatter graphs 1 and 2 to show the relationship between pack and the Average number of TV hours watched per week for malesIn these scatter graphs there is a slight proscribe correlation. This means that as the number of TV hours goes up Weight goes down. This may not be an accurate graph as there are a a couple of(prenominal) anomalous results that may have caused the slip line to be that gradient. If this is so my hypothesis would have been correct, if it is not the gradient of the trend line isnt steep enough to vocalise that it is 100% certain that it is accurate. I will need to use the cumulative frequency graphs to draw complete conclusions.Cumulative frequency graphs for the number of TV hours watched and Weights of malesThese two graphs look quite distinct the weights graph has most of its data concentrated in the meat of the range, between 30-50 and looks like a normal cumulative frequency curve. Whereas the number of TV hours has most of its data concentrated at the beginning between 0-30, present that there is likely to be an anomalous result at the end of the range. These anomalous results on the TV hours graph are what caused the slight negative correlation on the trend line. I will be able to make complete conclusions after looking at the female sample and seeing if that graph follows suit. The box plots for these graphs will look quite polar and will make it unaffixed to make a simple comparison. rap plots for Cumulative frequency graphs IQ and Weight for malesFrom the box plots I can see that the two sets of data are intimately identical in range which would cause a straight line on the scatter graph it is because of the anomalous results on the TV hours which caused the slight negative correlation. The weights box plot shows me that the data is quite evenly spread in the middle of the range apart from a very heavy person at the end which is why the highest figure is so far apart from the upper quartile. Overall the box plots show me that the similarity in the data means there is no relationship and hypothesis was correct. Hypothesis 2 FemalesAgain I will start with the scatter graphs to show the relationship between shape of TV hours watched and weight. The graphs should be similar to the males and the conclusions the same. Again I had an anomalous result and had to create a molybdenum scatter graph without it there.Scatter graphs 1 and 2 to show the relationship between the Number of TV hours watched per week and WeightThe second scatter graph in this section, without the anomalous result completely changed the trend line. The first graph looks a lot more like the male graph whereas the second follows my hypothesis a lot better. In graph 1 there is a slight gradient on the graph which points towards a negative correlation, like those of the male sample. On the graph without the anomalous result there is clearly no correlation whatsoever as the line is nearly horizontal. I will take the results of the male sample to be wrong as I said earlier there are a few anomalous results which caused the trend line to be at that gradient. Now I will look at the cumulative frequency graphs to see what results I get from them.Cumulative frequency graphs for Average number of TV hours watched per week and Weight for FemalesAs on the males graph the TV hours for females have a lot of anomalous results. But for the scatter graphs I cancelled them all out which gave no correlation. If the line at the top of the TV hours graph is blanked out the two graphs look almost identical. This is why the scatter graph got a near horizontal trend line. The box plots for these to graphs will look resembling apart from there will be a much yearner line at the end of the TV hours graph because of the anomalous results.Box plots of cumulative frequency graphs for Number of TV hours watched and weights of femalesThese box plots show me the same as the males did, that the data is almost identical if placed 1 on top of the other. This is what caused the horizontal line in my scatter graphs and proves my hypothe sis.ConclusionHypothesis 1 My first hypothesis has been turn up irrational. The scatter graphs show that there is no correlation between the two sets of data. For my hypothesis to have been correct there would have demand to be a strong affirmative correlation. The cumulative frequency graphs and box plots again proved my hypothesis incorrect, the similarities in the two sets of datas box plots showed that there was no relationship and showed why the scatter graphs showed a straight line. twain the male and female samples showed that my hypothesis was incorrect although some anomalous results created a slight negative correlation in both it was obvious that it was even-tempered wrong.Hypothesis 2 My second hypothesis was proved correct. The scatter graphs showed that there was absolutely no correlation on the graphs which means no relationship. Although the male graphs did show a a negative correlation it was proved to be made by a few anomalous results by the cumulative frequ ency and later the inconsistency with the female sample. The female scatter graph showed a near horizontal trend line which was what I infallible to prove my hypothesis. The similarities on the cumulative frequency graphs and box plots further proved my hypothesis was correct.EvaluationThe investigation went quite well although my first hypothjesis was incorrect it showed that careful analysis of data is ask before drawing conclusions. When I next do an investigation into data I will use histograms to aid me in my analysis as they come in efficacious when looking for relationships in two sets of data as the cumulative frequency graphs do. I could have made the cumulative frequency graphs a minuscular better as the program I used did not lay out a scale on the x axis but only the length of the range.Statistics Coursework1st Hypothesis For my first hypothesis I will investigate the relationship between the number of TV hours watched per week by the pupils against their IQ. I am going to use the columns IQ and Average number of hours TV watched per week taken from the Mayfield high datasheet. I think that there will be a relationship between them and will attempt to reveal it.2nd Hypothesis For my second hypothesis I will investigate the relationship between Average number of TV hours watched per week and weight (kg). I think that there will not be any major relationship between as they will not affect each other greatly.I will present my analysis and the results in graphs and tables and explain the results using the correlation of the graphs and arrangements of the figures.I will select a number of pupils to base my data on and will use random sampling to ascertain the correct number of male and female pupils needed to make the investigation fair.Stratified SamplingI do not want to use all of the data in the database for my analysis so I will need to take a sample of the number of people in the school. I would like to take about 10% of the overall figure. I will also need to use stratified sampling to make it an equal proportion of the number of males and females in the school to make it fair.The total number of pupils at the school is 813 so I will need to take 10% as my number, 81.3 is rounded down to 81.The overall ratio for boys and girls in the school is 414399Now I will need to do my samplingMales = 414 multiplied by 81 = 41813Females = 399 multiplied by 81 = 40813Random SamplingNow I have the number of samples I will need to select the samples I will be taking. To do this I will use random sampling. I will take random samples until I have 81. I can do this on Excel using the following formula = round(round()*120.Once I have gathered the samples I am ready to start analyzing my samples.AnalysisHypothesis 1 MalesThe first thing I need to do in my analysis is to analyze my graphs which are the source of the investigation. I have created scatter graphs to show the relationship if the two data sources for my first hypothesis. I ha ve separated them into male and female graphs as there is a separation in the numbers.First male scatter graphThis first graph presented a bit of a problem. There was an anomalous result that affected the trend line and the scale of the graph. I decided to create a new graph that didnt include that 1 piece of data. This way it would help me to analyze the rest of the data.Second male scatter graphThis graph showed the data much clearer and I could then start analyzing it. There is no correlation between the 2 sets of data. This means that it is unlikely that there is a relationship between IQ and Average number of TV hours watched per week. In this it may be that my hypothesis is incorrect. There is only a very slight gradient on the trendline that leans towards a negative correlation, but the gradient is not steep enough to draw any conclusions about the relationship between the two sets of data. I will have to use the cumulative frequency graphs and boxplots to see if any conclusi ons can be made.Cumulative frequency graphs for IQ and Average number of TV hours watched per weekFrom these graphs I could create box plots and compare the two sets of data. Before that I analyzed the cumulative frequency graphs to draw initial conclusions. The majority of the IQs for males are between 90 105, this shows that the data is quite spread out as this section only covers a small area of the graph. For the TV hours graph, again the data is spread among 1 main area in this case it is between 5-25. There is almost a straight line near the top of the graph this shows that there is likely to be some anomalous results and 0 pupils in between that result and the main bulk. Now I will create box plots so I can compare the two graphs together.Box plots for cumulative frequency graphs of IQ and average number of TV hours watched per week (for interquartile ranges look at copies of graphs at the back)From the box plots I can see that the data spread is relatively the same apart fr om a possible anomalous result in the TV hours data. This similarity is the reason why the scatter graph had no correlation and therefore no relationship. This means that my hypothesis is wrong.Hypothesis 1 FemalesAgain I will start with the scatter graphs. As with the male graph I had an anomalous result that spread out the data and scale down the graph so most of the relevant data couldnt be analyzed. I then did another graph without that specific piece of data.Scatter Graphs 1 and 2 to show the relationship between IQ and average number of TV hours watched per week for FemalesAs you can see on both the graphs there is no correlation between the two sets of data. This again means that my first hypothesis is unlikely to be correct. There is only a slight gradient on the trend line which is not steep enough to draw any conclusions from it. There is another anomalous result on the graph but it doesnt affect the trend line and my conclusions so I left it on the graph. I will now crate cumulative frequency graphs to see if they can help me to draw conclusions.Cumulative frequency graphs for the IQ and number of TV hours watched per weekI will now analyze the graphs before drawing box plots to compare the graphs. The IQs graph is much more erratic which means that the data is spread over a larger range. Although there is 1 area where the data is concentrated and the gradient very steep, between 95-105. The TV hours graph is much smoother and the data less spread. The data number of hours increases steadily to a certain point then it goes flat until the end. This means that there is a n anomalous result somewhere. I know that it can only be 1 or 2 anomalous because the point where it goes flat is at about 38 and there are only 39 sets of data in the graph. I will now look at the box plots to compare the two cumulative frequency graphs.Box plots for cumulative frequency graphs of IQ and number of TV hours watched for femalesThe box plots for these graphs show me tha t the IQ data has a much larger range and that it is quite evenly spread. I can see this because the interquartile range is quite large and the median evenly spread. There may be a few exceptions as 1 pupil is likey to have a very low IQ which is why the lowest value is so low. The TV hours data seems to be much more concentrated and the data is generally lower. This shows that there cant be any relationship between them as they each grouped in certain areas. Also the box plot for TV hours shows that there is likely to bge an anomalous result as the highest value is so far out of the upper quartile.Hypothesis 2 MalesIn this hypothesis I will be comparing the Average number of TV hours watched per week and Weight, to see if there is any relationship between them. I will again start with Males and the Scatter graphs.Scatter graphs 1 and 2 to show the relationship between Weight and the Average number of TV hours watched per week for malesIn these scatter graphs there is a slight negat ive correlation. This means that as the number of TV hours goes up Weight goes down. This may not be an accurate graph as there are a few anomalous results that may have caused the trend line to be that gradient. If this is so my hypothesis would have been correct, if it is not the gradient of the trend line isnt steep enough to say that it is 100% certain that it is accurate. I will need to use the cumulative frequency graphs to draw complete conclusions.Cumulative frequency graphs for the number of TV hours watched and Weights of malesThese two graphs look quite different the weights graph has most of its data concentrated in the middle of the range, between 30-50 and looks like a normal cumulative frequency curve. Whereas the number of TV hours has most of its data concentrated at the beginning between 0-30, showing that there is likely to be an anomalous result at the end of the range. These anomalous results on the TV hours graph are what caused the slight negative correlation on the trend line. I will be able to make complete conclusions after looking at the female sample and seeing if that graph follows suit. The box plots for these graphs will look quite different and will make it easy to make a simple comparison.Box plots for Cumulative frequency graphs IQ and Weight for malesFrom the box plots I can see that the two sets of data are almost identical in range which would cause a straight line on the scatter graph it is because of the anomalous results on the TV hours which caused the slight negative correlation. The weights box plot shows me that the data is quite evenly spread in the middle of the range apart from a very heavy person at the end which is why the highest figure is so far apart from the upper quartile. Overall the box plots show me that the similarity in the data means there is no relationship and hypothesis was correct.Hypothesis 2 FemalesAgain I will start with the scatter graphs to show the relationship between Number of TV hours wat ched and weight. The graphs should be similar to the males and the conclusions the same. Again I had an anomalous result and had to create a second scatter graph without it there.Scatter graphs 1 and 2 to show the relationship between the Number of TV hours watched per week and WeightThe second scatter graph in this section, without the anomalous result completely changed the trend line. The first graph looks a lot more like the male graph whereas the second follows my hypothesis a lot better. In graph 1 there is a slight gradient on the graph which points towards a negative correlation, like those of the male sample. On the graph without the anomalous result there is clearly no correlation whatsoever as the line is nearly horizontal. I will take the results of the male sample to be wrong as I said earlier there are a few anomalous results which caused the trend line to be at that gradient. Now I will look at the cumulative frequency graphs to see what results I get from them.Cumula tive frequency graphs for Average number of TV hours watched per week and Weight for FemalesAs on the males graph the TV hours for females have a lot of anomalous results. But for the scatter graphs I cancelled them all out which gave no correlation. If the line at the top of the TV hours graph is blanked out the two graphs look almost identical. This is why the scatter graph got a near horizontal trend line. The box plots for these to graphs will look alike apart from there will be a much longer line at the end of the TV hours graph because of the anomalous results.Box plots of cumulative frequency graphs for Number of TV hours watched and weights of femalesThese box plots show me the same as the males did, that the data is almost identical if placed 1 on top of the other. This is what caused the horizontal line in my scatter graphs and proves my hypothesis.ConclusionHypothesis 1 My first hypothesis has been proved incorrect. The scatter graphs show that there is no correlation bet ween the two sets of data. For my hypothesis to have been correct there would have needed to be a strong positive correlation. The cumulative frequency graphs and box plots again proved my hypothesis incorrect, the similarities in the two sets of datas box plots showed that there was no relationship and showed why the scatter graphs showed a straight line. Both the male and female samples showed that my hypothesis was incorrect although some anomalous results created a slight negative correlation in both it was obvious that it was still wrong.Hypothesis 2 My second hypothesis was proved correct. The scatter graphs showed that there was absolutely no correlation on the graphs which means no relationship. Although the male graphs did show a a negative correlation it was proved to be made by a few anomalous results by the cumulative frequency and later the inconsistency with the female sample. The female scatter graph showed a near horizontal trend line which was what I needed to prove my hypothesis. The similarities on the cumulative frequency graphs and box plots further proved my hypothesis was correct.EvaluationThe investigation went quite well although my first hypothjesis was incorrect it showed that careful analysis of data is needed before drawing conclusions. When I next do an investigation into data I will use histograms to aid me in my analysis as they come in useful when looking for relationships in two sets of data as the cumulative frequency graphs do. I could have made the cumulative frequency graphs a little better as the program I used did not put a scale on the x axis but only the length of the range.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.