Okay so I looked at GSPC fund thing and found the $/stock per year from 1950 to 2016. Each year I looked at February 1st and determined the price and listed them on an excel spreadsheet. Next I devided the year by the previous year and subtracted all of that by one to get the percentage of growth for that year.

For the average I added all these numbers up and divided by 65 (1950 doesn't count since there is no previous year to compare it to). I got 8.65%

The standard of deviation ended up being 16.24%.

So if I did this right, the conclusion is this. I have a 68% chance of a year's boundary being 8.65% +/- 16.24%. I have a 95% chance of a year's boundary being 8.65% +/- 32.47%. I have a 99.7% chance of a year's boundary being 8.65% +/- 48.71%.

Here are the questions:
1) Did I apply the correct average? I am aware that geometric means exist. I just thought since each set of data is independent from each other that I use a standard mean instead of a geometric mean. Is my understanding correct?
2) Is my sample size large enough? I'm not sure how one goes about setting up a sample size for this kind of thing. I have ideas about this but I'd rather base it on something tried and true. 65 seems rather small. Perhaps I should have done month by month or day by day instead. Does it matter?
3) Is there a standard sample size that works best? The closer the average is to 0% each year the larger the sample right?

Thanks in advance for helping me.

Mon Feb 08, 2016 3:18 am

global Full Member

Cash: $ 13.85

Posts: 64
Joined: 26 Jan 2016

Here are my stats list by the way in case this helps (I added stars to make it look nicer):

quote: So if I did this right, the conclusion is this. I have a 68% chance of a year's boundary being 8.65% +/- 16.24%. I have a 95% chance of a year's boundary being 8.65% +/- 32.47%. I have a 99.7% chance of a year's boundary being 8.65% +/- 48.71%.

Yes, that's correct. Investors often look at the +/- 2 std dev, the 95% answer reflects 19 years out of 20, that 'feels good' to them. So you get +/- 32% outcome in 19 years and there should be an outlier in that 20 year bunch. The min number of samples is usually considered to be 33 to have a statistical 'normal' distribution. (with smaller sample sizes, 'normalcy" is weakened, ie the confidence level of the outcome is lessened).
The +/-32% answer emphasizes the usual warning - ie, "stocks are not a one-year investment".
As you add data points, the variance statistically cancels - and at about 30 years it converges on about +11%/yr.

The std dev can also be found for individual stocks, if you check Spyders (SPY) the std dev should be listed - altho for smaller sample sizes than you used, your value is better.

Good analysis!

But as you can see - an exhaustive value for the std dev isn't of much value. You already knew that the market has wild one-year fluctuation, unsuitable for predictive analysis. You can use your methods to decide whether or not you need 10 yrs, 20 yrs, or 30 yrs to get useful predictive outcomes.

Mon Feb 08, 2016 4:27 am

global Full Member

Cash: $ 13.85

Posts: 64
Joined: 26 Jan 2016

Where do you get 11%? The average was calculated at 8.65%.

Are you saying that the average is actually 11% and the data happened to perform 2.35% worse than it should have?

If so, how do I find a source for averages of every type of stock out there.

Tue Feb 09, 2016 9:59 am

oldguy Senior Member

Cash: $ 736.65

Posts: 3582
Joined: 21 May 2006
Location: arizona

quote: Where do you get 11%? The average was calculated at 8.65%.

Your data points are good for finding the standard deviation that predicts one-year variances. (Your original goal.) And your std dev value is good.
But the arithmetic average (8.65%) of the 65 data points leaves out the compounding that you would need for multiple-year predicting.
Eg, the first four points would compound as follows: 1.266*1.067*1.1135"1.097 > 28.414 ( not 26.15). The error starts building quickly.

Going from 17.22 in 1950 to 1880.05 in 2016 ( 65 yrs) is 1880.05/17.22^(1/65) = 1.075,. Ie, 7.5%/yr with compounding, not 8.65%.
The 7.5%yr correlates with the 7.53%/yr on the website for the SP500 Index but without dividends rolled back in.

As for every type of stocks - Moody's published an annual book (about 2 inches thick) that showed each individual stock. I bought the book a couple times back in the 1970s (pre-computers), haven't looked for one for a long time.
But you are probably asking about each sector of stocks out there? You could look at the data from the ETFs - ie, the Financial Sector, Energy Sector, etc. Or - if you are asking about penny stocks, small cap, mid cap, big cap, int'l - you could look at the various indices.

Tue Feb 09, 2016 4:27 pm

global Full Member

Cash: $ 13.85

Posts: 64
Joined: 26 Jan 2016

I've given this some good thought. One thing that has been confusing me is your example. It actually appears that the 1.097 should actually be 1.0097 since in the data table it is .97%. So it is possible to manually multiply 17.22 to all of them together and it would work.

Your example at the bottom made it much clearer why the geometric mean is necessary. The ^(1/65) part makes sense, because it is like the averaging way for multiplied stuff. Which there is two ways to arrive at the first number. Either 1880.05/17.22. Also finding the product of all the percentages with a 1 at the beginning. This ensures that there is no negative numbers. Knowing this, I can now use the geomean function on the Excel spreadsheet. I wasn't using the correct ratios since I subtracted each number by 1. Now I know why it wasn't working earlier. Woohoo!

Thanks for the link too by the way.

Now I actually am confused about something else. How does the 8.65% help at all? Even for the year by year basis the 8.65% isn't proportional to the whole structure so it seems like it should be replaced with the 7%ish geometric mean number right?

P.S. I was wrong about 1950-2016. It is actually 67 sets of years since you have to count the year you start with. So 66 is the key number not 65. Oops.

Tue Feb 09, 2016 10:32 pm

global Full Member

Cash: $ 13.85

Posts: 64
Joined: 26 Jan 2016

Yeah, the 8.65% thing isn't proportional at all since the percentages are ratios. Each ratio needs to have the same "weight" in the calculation. The geometric mean is all I need. The arithmetic mean is completely worthless. Even for the year by year.

Tue Feb 09, 2016 10:42 pm

oldguy Senior Member

Cash: $ 736.65

Posts: 3582
Joined: 21 May 2006
Location: arizona

quote: should actually be 1.0097

You're right, I stand corrected.

quote: The geometric mean is all I need. The arithmetic mean is completely worthless. Even for the year by year.

Yes, the arithmetic average isn't of value. But the 66 individual variances that make up that number are needed to calculate the Std Dev.

quote: So it is possible to manually multiply 17.22 to all of them together and it would work.