Data Science: Shrinking Big Data into Meaningful Data

In this post I explain how less is more when it comes to using “big data.”  The best data is concise, meaningful, and actionable. It is both an art and a science to turn large, complex data sets into meaningful, useful information. Just like the later paintings of Monet capture the impression of beauty more effectively than a mere photograph, “small data” can help make sense of “big data.”

Monet Painting of the sun through the fog in London
Claude Monet, London

There is beauty in simplicity, but capturing simplicity is not simple. A young child’s drawings are simple too, but they very unlikely to capture light and mood like Monet did.

Worry not. There will be finance and math, but I will save the math for last, in an attempt to retain the interest of non “mathy” readers.

The point of discussing impressionist painting is show that reduction — taking things away — can be a powerful tool.  In fact, filtering out “noise” is both useful and difficult. A great artist can filter out the noise without losing the fidelity of the signal.  In this case, the “signal” is emotion and color and light as as perceived by a master painter’s mind.

 

Applying Impressionism to Finance

Massive amounts of data are available to the financial professional. Two questions I have been asking at Sigma1 since the beginning are 1) How to use “Big Compute” to crunch that data into better portfolios? 2) How to represent that data to humans — both investment pros and lay folk whose money is being invested?  After considerable thought, brainstorming, listening, and learning, I think we are beginning to construct a preliminary picture of how to do that — literally.

Portfolio Asset Relationships
Relationships between Portfolio Assets

While not a beautiful as a Monet painting, the picture above is worth a thousand words (and likely many thousands of dollars over time) to me.  The assets above constitute all of the current non-CASH building blocks of my personal retirement portfolio.  While simple, the above image took considerable software development effort and literally millions of computations to generate [millions is very do-able with computers].

This simple-looking image conveys complex information in an easy-to-understand form. The four colors — red, green, blue, and purple — convey four asset types: fixed income, US stocks, international stocks, and convertible securities. The angle between any two asset lines conveys the relative correlation between the pair.  In portfolio construction larger angles are better.  Finally the length of the line represents the “effectiveness” with which each asset represents its “angular position” within the portfolio (in addition to other information).

With Powerful Data, First Comes Humility, Next Comes Insight

I have applied the same visualizations to other portfolios, and I see that, according to my software, many of the assets in professionally-managed portfolios exhibit superior “robustness” to my own.  As someone who prides myself in having a kick-ass portfolio, this information is humbling, and took some time to absorb from an ego standpoint.  But, having gotten over it, I now see potential.

I have seen portfolios that have a significantly wider angle than my current portfolio.  What does this mean to me?  It means I will begin looking for assets to augment my personal portfolio.  Before I do that let me share some other insights. The plot combines covariance matrix data for the 16 assets in the portfolio, as well as semi-variance data for each asset.  Without getting to “mathy” yet, the data visualization software reduces 136 pieces of data down to 32 (excluding color). The covariance matrix and semi-variance calculation itself are also a reducers in that they combines 5 years monthly total-return data — 976 data points down to 120 unique covariance numbers and 16 semi-deviation numbers. Taking 976 down to 32 results in a compression ratio of 30.5:1.

Finally, as it currently stands, the visualization software and resulting plot say nothing about expected return.  The plot focuses solely on risk mitigation at the moment.  Naturally, I intend to change that.

Time for the Math and Finance — Consider Yourself Warned

I mentioned a 30.5:2 (71:2) compression ratio. Just as music and other data, other information, including financial information can be compressed.  However, only so much compression can be achieved in lossless manner.  In audio compression researchers have learned which portions of music and other audio can be “lost” without the listener telling the difference.  There is a field of psychoacoustics around doing just that — modeling what the human ear (and brain) can hear, and what gets “masked” by various physiological factors.

Even more important that preserving fidelity is extracting meaning. One way of achieving that is by removing “noise.” The visualization software performs significant computation to maintain as much angular fidelity as possible. As it optimizes angles, it keeps track of total error vis-a-vis the covariance matrix. It also keeps track of individual assets error (the reciprocal of fitness — fit versus lack of fit).

The real alchemy comes from the line-length computation.  It combines semi-variance data with various fitness factors to determine each asset line length.

Just like Mercator projections for maps incur unavoidable error when converting from a 3-D globe to a 2-D map, the portfolio asset visualizations introduce error as well.  If one thinks of just the correlation matrix and semi-variance data, each asset has a dimensionality of 8.5 (in the case of 16 assets).  Reducing from 8.5-D to 2-D is a complex process, and there are an infinite number of ways to perform such an operation!  The art and [data] science is to enhance the “signal” while stripping away the “noise.”

The ultimate goals of portfolio data visualization technology are:

1) Transform raw data into actionable insight

2) Preserve sufficient fidelity of relevant data such that the “map” can be used to reliably get to the desired “destination”

I believe that the first goal has been achieved.  I know what actions to take… trying various other securities to find those that can build a “higher-angle”, and arguably more robust, more resilient investment portfolio.

However, the jury is still out on the degree [no pun intended] to which goal #2 has or has not been achieved.  Does this simple 2-D map help portfolio builders reliably and consistently navigate the 8+ dimensional portfolio space?

What about 3-D Modelling and Visualization?

I started working with 2-D for one key reason — I can easily share 2-D images with readers and clients alike.  I want feedback on what people like and dislike about the visuals. What is easy to understand, what is not?  What is useful to them, and what isn’t?  Ironing out those details in 2-D is step 1.

Of course I am excited by 3-D. Most of the building blocks are in my head, and I can heavily leverage the 2-D algorithms.  I am, however, holding off for now. I am waiting for feedback from readers and clients alike.  I spend a lot of time immersed in the language of math, statistics, and finance.  This can create a communication gap that is best mitigated through discussion with other people with other perspectives.  I wish to focus on 2-D for a while to learn more about market needs.

That being said, it is hard to resist creating a 3-D portfolio asset visualizer. The geek in me is extremely curious about how much the error terms will reduce when given a third degree of freedom to work with.

The bottom line is: Please give me any feedback: positive, negative, technical, aesthetic, etc. This is just the start. I am extremely enthusiastic about where this journey will take me and my company.

Disclosure and Disclaimer

Securities mentioned in this post are holdings in my personal retirement accounts (e.g. 401K, IRA, Roth IRA) as of the day of initial publication of this post. The purpose of this post is to illustrate features of Sigma1 Financial software. This is NOT investment advice, and NOT a recommendation to buy, sell, or hold any securities. Please refer to the “Disclaimer” Tab of the main page of this site for further information.