# Dividends and Tax-Optimal Investing

The previous post showed after-tax results of a hypothetical 8% return portfolio. The primary weakness in this analysis was a missing bifurcation of return: dividends versus capital gains.

The analysis in this post adds the missing bifurcation. It is instructive to compare the two results. This new analysis accounts for the qualified dividends and assumes that these dividends are reinvested. It is an easy mistake to assume that since the qualified dividend rate is identical to the capital gains rate, that dividends are equivalent to capital gains on a post-tax basis. This assumption is demonstrably false.

Though both scenarios model a net 8% annual pre-tax return, the “6+2” model (6% capital appreciation, 2% dividend) shows a lower 6.98% after-tax return for the most tax-efficient scenario versus a 7.20% after-tax return for the capital-appreciation-only model. (The “6+2” model assumes that all dividends are re-invested post-tax.)

This insight suggests an interesting strategy to potentially boost total after-tax returns. We can assume that our “6+2” model represents the expected 30-year average returns for a total US stock market index ETF like VTI, We can deconstruct VTI into a value half and a growth half. We then put the higher-dividend value half in a tax-sheltered account such as an IRA, while we leave the lower-dividend growth half in a taxable account.

This value/growth split only produces about 3% more return over 30 years, an additional future value of \$2422 per \$10,000 invested in this way.

While this value/growth split works, I suspect most investors would not find it to be worth the extra effort. The analysis above assumes that the growth half is “7+1” model.  In reality the split costs about 4 extra basis points of expense ratio — VTI has a 5 bps expense ratio, while the growth and value ETFs all have 9 bps expense ratios. This cuts the 10 bps per year after-tax boost to only 6 bps. Definitely not worth the hassle.

Now, consider the ETF Global X SuperDividend ETF (SDIV) which has a dividend yield of about 5.93%. Even if all of the dividends from this ETF receive qualified-dividend tax treatment, it is probably better to hold this ETF in a tax-sheltered account. All things equal it is better to hold higher yielding assets in a tax-sheltered account when possible.

Perhaps more important is to hold assets that you are likely to trade more frequently in a tax-sheltered account and assets that you are less likely to trade in a taxable account. The trick then is to be highly disciplined to not trade taxable assets that have appreciated (it is okay to sell taxable assets that have declined in value — tax loss harvesting).

The graph shows the benefits of long-term discipline on after-tax return, and the potential costs of a lack of trading discipline. Of course this whole analysis changes if capital gains tax rates are increased in the future — one hopes one will have sufficient advanced notice to take “evasive” action.  It is also possible that one could be blindsided by tax raising surprises that give no advanced notice or are even retroactive! Unfortunately there are many forms of tax risk including the very real possibility of future tax increases.

# Investment Tax Management Boosts Returns in Surprising Ways

### A Common Tax Misconception

When asked to consider tax deferral investment strategies, many people instinctively conclude that tax deferral benefits the investor at the expense of the government. Such a belief is half-right. Tax deferral ultimately benefits both the investor and the government’s tax revenues. While there are exceptions involving inheritance, in most other cases both parties benefit. Figure 1 summarizes the relationship between higher after-tax returns and higher nominal net cash flows to the government.

The reason I lead with the government’s side of the tax equation is for tax policy wonks in Washington D.C. I suspect many of them already know this information, and this is simply another data point to add to their arsenal of tax facts. For the others, I hope this a wake-up call. The message:

When investors, investment advisors, and fund managers successfully defer long-term capital gains, investors and governments win in the long run.

The phrase “in the long run” is important. When taxes are deferred, the government’s share grows along with the investor’s. In the short term, taxes are reduced; in the long run taxes are increased. For the investor this long-run tax increase is more than offset by increased compounding of return.

Please note that all of these win/win outcomes occur under a assumption of fixed tax rates — which is 20% in this example. It is also worth noting that these outcomes occur for funds that are spent at any point in the investor’s lifetime. This analysis does not necessarily apply to taxable assets that are passed on via inheritance.

Critical observers may acknowledge the government tax “win” holds for nominal tax dollars, but wonder whether it still holds in inflation-adjusted terms. The answer is “yes” so long as the the investor’s (long-run) pre-tax returns exceed the (long run) rate of inflation. In other words so long as g > i (g is pre-tax return, i is inflation), the yellow line will be upward sloping; More effective tax-deferral strategies, with higher post-tax returns, will benefit both parties. As inflation increases the slope of the yellow line gets flatter, but it retains an upward slope so long as pre-tax return is greater than inflation.

Responsible investors face many challenges when trying to preserve and grow wealth. Among these challenges are taxes and inflation. I will start by addressing two important maxims in managing investment taxes:

1. Avoid net realized short-term (ST) gains
2. Defer net long-term gains as long as possible

It is okay to realize some ST gains, however it is important to offset those gains with capital losses. The simplest way of achieving this offset is to realize an equal or greater amount of ST capital losses within the same tax year. ST capital losses directly offset ST capital gains.

A workable, but more complex way of offsetting ST gains is with net LT capital losses.The term net is crucial here, as LT capital losses can only be used to offset ST capital gain once they have been first used to offset LT capital gains.  It is only LT capital losses in excess of LT capital gains that offset ST gains.

If the above explanation makes your head spin, you are not alone. Managing capital gains is really an exercise in linear programming. In order to make this tax exercise less (mentally) taxing, here are some simple concepts to help:

• ST capital losses are better than LT capital losses
• ST capital gains are worse than LT capital gains
• When possible offset ST losses with ST gains

Because ST capital losses are better than LT, it often makes sense to see how long you have held assets that have larger paper (unrealized) losses. All things equal it is better to “harvest” the losses from the ST losers than from the LT losers.

Managing net ST capital gains can potentially save you a large amount of taxes, resulting in higher post-tax returns.

### Tax Advantages for the Patient Investor

Deferring LT capital gains requires patience and discipline. Motivation can help reinforce patience. For motivation we go back to the example used to create Figure 1. The example starts today with \$10,000 investment in a taxable account and a 30-year time horizon. The example assumes a starting cost basis of zero and an annual return of 8%.

This example was set up to help answer the question: “What is the impact of ‘tax events’ on after-tax returns?” To keep things as simple as possible a “tax event” is an event that triggers a long-term capital gains tax realization in a tax year. Also, in all cases, the investor liquidates the account at the start of year 31. (This year-31 sale is not counted in the tax event count.)

It turns out that it not just the number of tax events that matters — it is also the timing. To capture some of this timing-dependent behavior, I set up my spreadsheets to model two different timing modes. The first is called “stacked” and it simply stacks all tax events in back-to-back years. These second mode is called “spaced” because the tax events are spaced uniformly.  Thus 2 stacked tax events occur in years 1, 2, while 2 spaced tax events occur in years 10 and 20. The results are interesting:

The most important thing to notice is that if an investor can completely avoid all “tax events” for 30 years the (compound) after-tax return is 7.2% per year, but if the investor triggers just one taxable event the after tax return is significantly reduced. A single “stacked” tax event in year 1 reduces after tax returns to 6.49% while a single “spaced” tax event in year 15 reduces returns to 6.67%. Thus for a single event the spaced tax event curve is higher, while for all other numbers of tax events (except 30 where they are identical) it is lower than the stacked-event curve.

The main take-away from this graph is that tax deferral discipline matters. The difference between 7.2% and 6.67% after-tax return, over thirty years is huge when framed in dollar terms. With zero (excess) tax events the after-tax result in year 31 is \$80,501. With one excess tax event (with the least harmful timing!) that sum drops to \$69,476.

In the worst case the future value drops to \$51,444 with an annual compound after-tax return of only 5.61%.

### Tax Complexity, Tax Modeling Complexity, and Other Factors

One of the challenges faced when bringing fresh perspectives to the tax-plus-investing dialog is in providing examples that paint the broad portfolio tax management themes in a concise way. The first challenge is that the tax code is constantly changing, so predicting future tax rates and tax rules is an imprecise game at best. The second challenge is that the tax code is so complex that any generalization will mostly likely have a counterexample buried somewhere in the tax code. The third complication is that baring significant future tax code changes and obscure tax code counterexamples, creating a one-size-fits-all model for investors results in large oversimplifications.

I believe that tax indifference is the wrong answer to the question of portfolio tax optimization. The right answer is more closely aligned with the maxim:

All models are wrong. Some are useful.

This common saying in statistics gets to the heart of the problem and the opportunity of investment tax management. It is better to build a model that gives deeper insight into opportunities that exist in reconciling prudent tax planning with prudent investment management, than to build no model at all.

The simple tax model used in this blog post makes some broad assumptions. Among these is that the long-term capital gains rate will be the same for 30 years and that the investor will occupy the same tax bracket for 30 years. The pre-tax return model is also very simple: 8% pre-tax return each and every year.

I argue that models as simple as this are still useful. They illustrate investment tax-management tax principles in a manner that is clear and draws the same conclusions as analysis using more complex tax modelling. (Complex models also have their place.)

I would like to highlight the oversimplification I think is most problematic from a tax perspective.  The model assumes all the returns (8% per year) are in the form of capital appreciation. A better “8%” model would be to assume a 2% dividend and 6% capital appreciation.  Dividends, even though receiving qualified-dividend tax treatment, would bring down the after-tax returns, especially on the left side of the curve.  I will likely remedy that oversimplification in a future blog post.

### Investment Tax Management Summary

1. Tax deferral does not hurt government revenues; it helps in the long run.
2. Realized net short-term capital gains can crater post-tax investment returns and should be avoided.
3. Deferral of (net) long-term capital gains can dramatically improve after-tax returns.
4. Tax deferral strategies require serious investment discipline to achieve maximum benefit.
5. Even simple tax modelling is far better than no tax modelling at all.  Simple tax models can be useful and powerful. Nonetheless, investment tax models can and should be improved over time.

# How to Write a Mean-Variance Optimizer: Part 1

### The Equation Everyone in Finance Show Know, but Many Probably Don’t!

… With thanks to codecogs.com which makes it really easy to write equations for the web.

This simple matrix equation is extremely powerful.  This is really two equations.  The first is all you really need.  The second is just merely there for illustrative purposes.

This formula says how the variance of a portfolio can be computed from the position weights wT = [w1 w2 … wn] and the covariance matrix V.

• σii ≡ σi2 = Var(Ri)
• σij ≡ Cov(Ri, Rj) for i ≠ j

The second equation is actually rather limiting.  It represents the smallest possible example to clarify the first equation — a two-asset portfolio.  Once you understand it for 2 assets, it is relatively easy to extrapolate to 3-asset portfolios, 4-asset portfolios, and before you know it, n-asset portfolios.

Now I show the truly powerful “naked” general form equation:
$inline&space;dpi{300}&space;large&space;sigma_{p}^{2}=&space;mathbf{w}^topmathbf{V}mathbf{w}$
This is really all you need to know!  It works for 50-asset portfolios. For 100 assets. For 1000.  You get the point. It works in general. And it is exact. It is the E = mc2 of Modern Portfolio Theory (MPT).  It at least about 55 years old (2014 – 1959), while E = mc2 is about 99 years old (2014 – 1915).  Harry Markowitz, the Father of (M)PT simply called it “Portfolio Theory” because:

Yes, I’m calling Markowitz the Einstein of Portfolio Theory AND of finance!  (Now there are several other “post”-Einstein geniuses… Bohr, Heisenberg, Feynman… just as there are Sharpe, Scholes, Black, Merton, Fama, French, Shiller, [Graham?, Buffet?]…)   I’m saying that a physicist who doesn’t know E = mc2 is not much of a physicist. You can read between the lines for what I’m saying about those that dabble in portfolio theory… with other people’s money… without really knowing (or using) the financial analog.

### Why Markowitz is Still “The Einstein” of Finance (Even if He was “Wrong”)

Markowitz said that “downside semi-variance” would be better.  Sharpe said “In light of the formidable
computational problems…[he] bases his analysis on the variance and standard deviation.”

Today we have no such excuse.  We have more than sufficient computational power on our laptops to optimize for downside semi-variance, σd. There is no such tidy, efficient equation for downside semi-variance.  (At least not that anyone can agree on… and none that that is exact in any sense of any reasonable mathematical definition of the word ‘exact’.)

Fama and French improve upon Markowitz (M)PT [I say that if M is used in MPT, it should mean “Markowitz,” not “modern”, but I digress.] Shiller, however, decimates it.  As does Buffet, in his own applied way.  I use the word decimate in its strict sense… killing one in ten.  (M)PT is not dead; it is still useful.  Diversification still works; rational investors are still risk-averse; and certain low-beta investments (bonds, gold, commodities…) are still poor very-long-term (20+ year) investments in isolation and relative to stocks, though they still can serve a role as Markowitz Portfolio Theory suggests.

### Wanna Build your Own Optimizer (for Mean-Return Variance)?

This blog post tells you most of the important bits.  I don’t really need to write part 2, do I?   Not if you can answer these relatively easy questions…

• What is the matrix expression for computing E(Rp) based on w?
• What simple constraint is w subject to?
• How does the general σp2 equation relate to the efficient frontier?
• How might you adapt the general equation to efficiently compute the effects of a Δw event where wi increases and wj decreases?  (Hint “cache” the wx terms that don’t change,)
• What other constraints may be imposed on w or subsets (asset categories within w)?  How will you efficiently deal with these constraints?
• Is short-selling allowed?  What if it is?
• OK… this one’s a bit tricky:  How can convex optimization methods be applied?

If you can answer these questions, a Part 2 really isn’t necessary is it?

# Surpassing the Frontier?

Suppose you have the tools to compute the mean-return efficient frontier to arbitrary (and sufficient) precision — given a set of total-return time-series data of asset/securities.  What would you do with such potential?

I propose that the optimal solution is to “breach the frontier.”  Current portfolios provide a historic reference. Provided reference/starting point portfolios have all (so far) provided sufficient room for meaningful and sufficient further optimization, as gauged by, say, improved Sortino ratios.

Often, when the client proposes portfolio additions, some of these additions allow the optimizer to push beyond the original efficient frontier (EF), and provide improved Sortino ratios. Successful companies contact  ∑1 in order to see how each of their portfolios:

1) Land on a risk-versus-reward (expected-return) plot
2) Compare to one or more benchmarks, e.g. the S&P500 over the same time period
3) Compare to an EF comprised of assets in the baseline portfolio

Our company is not satisfied to provide marginal or incremental improvement. Our current goal is provide our client  with more resilient portfolio solutions. Clients provide the raw materials: a list of vetted assets and expected returns.  ∑1 software then provides near-optimal mix of asset allocations that serve a variety of goals:

1) Improved projected risk-adjusted returns (based on semi-variance optimization)
2) Identification of under-performing assets (in the context of the “optimal” portfolio)
3) Identification of potential portfolio-enhancing assets and their asset weightings

We are obsessed with meaningful optimization. We wish to find the semi-variance (semi-deviation) efficient frontier and then breach it by including client-selected auxiliary assets. Our “mission” is  as simple as that — Better, more resilient portfolios

# Approaching the Frontier

Disclosure: The purpose of this post is to show how I, personally, use the HALO Portfolio Optimizer software to manage my personal portfolio. It is not investment advice! I use my personal opinions about which assets to select and expected one-year returns into the optimizer configuration.  The optimizer then provides an efficient frontier (EF) based on historic total-return data and my personal expected-return estimates.

Past performance is no guarantee of future performance, nor is past volatility necessarily indicative of future volatility.  Nonetheless, I am making the personal decision to use past volatility information to possibly increase the empirical diversification of my retirement portfolio with the goal of increasing risk-adjusted return.  Time will tell whether this approach was successful or not.

In my last post I blogged about reallocating my entire retirement portfolio closer to the MVO efficient frontier computed by the HALO Portfolio Optimizer.  The zoomed in plot tells the story to date:

The “objective space” plot is zoomed in and only shows a small portion of the efficient frontier. As you can see the black X is closer to the efficient frontier than the blue diamond, but naturally the dimensions are not the same. Using a risk-free rate of 0.5% the predicted Sharpe ratio has improved from 0.68 to 0.75 – a marked increase of about 10.3%.  [If you crunch the numbers yourself, don’t forget to annualize σ.]

While a 10.3% Sharpe ratio expected improvement is very significant, there is obviously room for compelling additional improvement. An expected Sharpe ratio of just north of 0.8 is attainable.

The primary reason the portfolio has not  yet moved even closer to the efficient frontier is due to 18.6% of the retirement portfolio being tied up in red tape as a result of my recent voluntary severance or “buy-out” from Intel Corporation. [ Kudos to Intel for offering voluntary severance to all of my local coworkers and me.  It is a much more compassionate method of workforce reduction than layoffs!  I consider the package offered to me reasonably generous, and I gladly took the opportunity to depart and begin working full time building my start up.]

### Time to Get Technical

I won’t finish without mentioning a few important technical details. The points in the objective space (of monthly σ on the horizontal and expected annual return on the vertical) can be viewed as dependent variables of the (largely) independent variables of asset weights. Such points include the blue diamond, the black X, and all the red triangles on the efficient frontier. I often call the (largely) independent domain of asset allocation weights the “search space”, and the weightings in the search space that result in points on the efficient frontier the “solution space.”

One way to measure the progress from the blue diamond to the X is via improvement in the Sharpe ratio, which implicitly factors in the CAL, or the CML for the tangent CAL.  As “X” approaches the red line visually it also approaches the efficient frontier quantitatively and empirically.  However, X can make significant progress towards the efficient frontier, say point EF#9 specifically, with little or no “progress” in the portfolio weights from the blue diamond to the black X.

“Progress” in the objective space is reasonably straight forward — just use Sharpe ratios, for instance. However measuring “progress” in the asset allocation (weight) space is perhaps less clear. Generally, I prefer the use of the L1-norms of differences of the asset-weight vectors Wo (corresponding to original portfolio weight; e.i. the blue diamond), Wx, and Wef_n. The distance of from the blue diamond  in search space to the red triangle #9 is denoted as |Wef_9 – Wo|1 while the distance from X in the search space is |Wef_9Wx|1.  Interestingly, the respective values are 0.572 and 0.664.  Wis, by this measure, actually further from Wef_9 in search space, but closer in objective space!

I sometimes refer to these as the “Hamming distances” (even though “Hamming distance” is typically applied to differences in binary codes or character inequality counts of two strings of characters.) It is simply easier to say the “Hamming distance from Wx to Wef_9” than the “ell-one norm of the difference of Wx and Wef_9.”

I have been working on an utility temporarily called “user tuner” that makes navigating in both the search space and the objective space quicker, easier and more productive. More details to follow in a future post.

### Why Not Semi-Variance Optimization?

Frequent readers will know that I believe that mean semi-variance optimization (MSVO or SVO) is superior to vanilla MVO. So why am I starting with MVO? Three reasons:

• To many, MVO is less scary because it is somewhat familiar. So I’m starting with the familiar “basics.”
• I wanted to talk about Sharpe ratios first, because again they are more familiar than, say, Sortino ratios.
• I wanted to use “User Tuner”, and I originally coded it for MVO (though that is easily remedied).

However, asymptotically refining allocation of my entire portfolio to get extremely close to the MVO efficient frontier is only phase 1.  It is highly likely I will compute the SVO efficient frontier next and use a slightly modified “User Tuner” to approach the mean semi-variance efficient frontier… Likely in the next month or two, once my 18.6% of assets are freed up.

# Portfolio-Optimization Plots

I am happy to announce that the latest version of the HALO Portfolio-Optimization Suite is now available.  Key features include:

• Native asset constraint support
• Native asset-category constraint support
• Dramatic run-time improvements of 2X to over 100X

Still supported are user-specified risk models, including semi-variance and max-drawdown.  What has been temporarily removed (based on minimal client interest) is 3-D 2-risk modelling and optimization.  This capability may be re-introduced as a premium feature, pending client demand.

Here is a quick screenshot of a 20-asset, fixed-income portfolio optimization.  The “risk-free” rate used for the tangent capital allocation line (CAL) is 1.2% (y-intercept not shown), reflecting a mix of T-Bills and stable value funds.  Previously this optimization took 18 minutes on an \$800 laptop computer.  Now, with the new HALO software release, it runs in only 11 seconds on the same laptop.

# The Best Financial Models for Insight and Prediction?

The best models are not the models that fit past data the best, they are the models that predict new data the best. This seems obvious, but a surprising number of business and financial decisions are based on best-fit of past data, with no idea of how well they are expected to correctly model future data.

### Instant Profit, or Too Good to be True?

For instance, a stock analyst reports to you that they have a secret recipe to make 70% annualized returns by simply trading KO (The Coca-Cola Company).  The analyst’s model tells what FOK limit price, y, to buy KO stock at each market open.  The stock is then always sold with a market order at the end of each trading day.

The analyst tells you that her model is based on three years of trading data for KO, PEP, the S&P 500 index, aluminum and corn spot prices.  Specifically, the analyst’s model uses closing data for the two preceding days, thus the model has 10 inputs.  Back testing of the model shows that it would have produced 70% annualized returns over the past three years, or a whooping 391% total return over that time period.  Moreover, the analyst points out that over 756 trading days 217 trades would have been executed, resulting in profit a 73% of the time (that the stock is bought).

The analyst, Debra, says that the trading algorithm is already coded, and U.S. markets open in 20 minutes. Instant profit is only moments away with a simple “yes.” What do you do with this information?

### Choices, Chances, Risks and Rewards

You know this analyst and she has made your firm’s clients and proprietary trading desks a lot of money. However you also know that, while she is thorough and meticulous; she is also bold and aggressive. You decide that caution is called for, and allocate a modest \$500,000 to the KO trading experiment.  If after three months, the KO experiment nets at least 7% profit, you’ll raise the risk pool to \$2,000,000.  If, after another three months, the KO-experiment generates at least 7% again; you’ll raise the risk pool to \$10,000,000 as well as letting your firms best clients in on the action.

Three months pass, and the KO-experiment produces good results: 17 trades, 13 winners, and a 10.3% net profit. You OK raising the risk pool to \$2,000,000.  After only 2 months the KO-experiment has executed 13 trades, with 10 winners, and a 11.4% net profit.  There is a buzz around the office about the “knock-out cola trade”, and brokers are itching to get in on it with client funds. You are considering giving the green light to the “Full Monty,” when Stan the Statistician walks into your office.

Stan’s title is “Risk Manager”, but people around the office call him Stan the Statistician, or Stan the Stats Man, or worse (e.g. “Who is the SS going to s*** on today?”)  He’s actually a nice guy, but most folks consider him an interloper.  And Stan seems to have clout with corporate, and he has been known to use it to shut down trades. You actually like Stan, but you already know why he is stopping by.

Stan begins probing about the KO-trade.  He asks what you know.  You respond that Debra told you that the model has an R-squared of 0.92 based on 756 days of back-tested data.  “And now?” asks Stan.  You answer, “a 76% success rate, and profits of around 21% in 5 months.”  And then Stan asks, “What is the probability that that profit is essentially due to pure chance?”

You know that the S&P 500 historically has over 53% “up” days, call it 54% to be conservative. So stocks should follow suit.  To get exactly 23 wins on KO out of 30 tries is C(30, 23)*0.54^23*(0.46)^7 = 0.62%. To get at least 23 (23 or more wins) brings the percentage up to about 0.91%.  So you say 1/0.091 or about one in 110.

Stan says, “Your math is right, but your conclusion is wrong.  For one thing, KO is up 28% over the period, and has had 69% up days over that time.”  You interject, “Okay, wait one second… so my math now says about 23%, or about a 1 in 4.3 chance.”

Stan smiles, “You are getting much closer to the heart of the matter. I’ve gone over Debra’s original analysis, and have made some adjustments. My revised analysis shows that  there is a reasonable chance that her model captures some predictive insight that provides positive alpha.”  Stan’s expression turns more neutral, “However, the confidence intervals against the simple null hypothesis are not as high as I’d like to see for a big risk allocation.”

### Getting all Mathy? Feedback Requested!

Do you want to hear more from “Stan”? He is ready to talk about adjusted R-squared, block-wise cross-validation, and data over-fitting. And why Debra’s analysis, while correct, was also incomplete. Please let me know if you are interested in hearing more on this topic.

Please let me know if I have made any math errors yet (other than the overtly deliberate ones).  I love to be corrected, because I want to make Sigma1 content as useful and accurate as possible.