Friday, July 15, 2011

Healthcare, Marketing, and Patterns

Marketers often try to find patterns in their data in order to better advertise, and this got me thinking about connections to how health care professionals prescribe treatments.

We’re all familiar with companies sending email to market their products. What many don’t realize is that these companies leverage their data to try to discover patterns in what ads might be beneficial to you. This process is severely limited by the type, quantity, and quality of data available about a particular customer. For instance, I’ve never purchased any products through Groupon, so the only data they have for me is my Zip code. So although I thought it was silly that they recently sent me, a single male, an offer for a women’s fitness center membership, they merely lacked the data to do any better. Amazon has been better for me, but not by much. I only purchase products there a couple times every year, so each individual purchase carries more weight than it should. Just because I once purchased a television there does not mean that I am interested in seeing ads for TVs every month. Although it’s possible their algorithms could benefit from some newer concepts from pattern based analytics, their real issues are a lack of data about me. If I were to patronize them more frequently, they could offer ads more representative of my interests.

So how does this relate to healthcare? Well just as I don’t patronize Amazon more than a few times a year, most people don’t visit their physician more than a handful of times a year either. The physician might have additional data in the form of family history, but that is always changing and tends to be incomplete. The intricacies of why “personalized medicine” has yet to prosper are beyond the scope of this conversation, but I suspect that data quality and quantity has been a hindrance. Better, more complete data about a patient’s health history could lead to a better understanding and therefore better treatment options and preventative measures.

Friday, July 8, 2011

Data, Data Everywhere

This post is about data and getting it ready for analysis. There's lots of data available, but not all of it is in a "ready-to-analyze" format. For pattern-based analytics, the data that goes in is absolutely critical to the patterns that come out in the end (garbage in, garbage out - see http://en.wikipedia.org/wiki/Garbage_In,_Garbage_Out). There are several aspects to consider about your data before you should look for patterns that it might contain:

Is the data structured?
Patterns rely on relationships between attributes of your data. For someone to visualize a pattern, there must be a structure to the data attributes that make sense to each other and the question at hand. If you have a pile of data and can't imagine how you might put it into a spreadsheet for analysis to answer your question, your data is most likely still too unstructured to find patterns in it. Before your data is suitable for pattern-based analytics, you will need to make sure you have a strategy to take your raw data and put it into a table or spreadsheet structure. (Good news - databases are already structured!)

Are attributes organized in a manner so they are useful to analyze?
If you have a dataset (already structured at this point), hopefully the data in the attribute columns can be sorted into groups suitable for making recognizable patterns. If this is not the case, there might be potential attributes lurking within a single attribute column that can be broken out. This is because an attribute might contain several pieces of data that should be broken out to make the most of your patterns.
Imagine some sales data for movie tickets by date, using cells with "1/1/2011", "1/2/2011", and so on. Treated as a single item, a date isn't very useful; if you can use the date to break the attribute into attributes containing Month(1-12), Day of Month(1-31), Year(2010-2011), and Day of Week(Mon-Sun), you would most likely find your data telling you that ticket sales are highest on weekends. You would probably find your highest sales clustered in weekends in the summer. This isn't a novel discovery to most readers, I would imagine, but it is entirely derivable from the data being analyzed. Once you can find those patterns, the next step is to see if there might be other, more subtle patterns you can take advantage of in the same manner by reframing the attributes.

Will you be comparing patterns from different data sources?
If you want to work with different datasets, you need to make sure the attributes will be equivalent so the patterns will be equivalent. You want to have confidence that a pattern seen in one set of data will be meaningful in the context of another set of data. A simple example is this - temperature. If you are looking at temperature values, it's very important to know if an attribute is in Fahrenheit or Celsius. If you're trying to see if there are common, seasonal buying patterns between soft drinks based on datasets in America and Europe, you might decide to incorporate average outside temperature data into your analysis. Hopefully you will have incorporated the "degrees C" or "degrees F" into your pattern, or you could be wasting time trying to rationalize something that doesn't exist or make an incorrect analysis based on data that wasn't properly incorporated.

These are a few of the challenges that arise for many types of data analysis, but become highlighted in pattern-based analytics because of the link between the patterns and the data itself. Are there other other potential pitfalls/solutions in the transition from data to pattern you can think of?

Wednesday, July 6, 2011

Gov 2.0

Many government websites have a wealth of data on their websites. However, it has traditionally been very difficult to find and make use of the data due to usability issues. For the last several years, Tim Oreilly has been been pushing Gov 2.0 as an effort to push many of the web 2.0 ideas into Government. One key idea is that of providing a standards based platform for serving data. By making more data more consumable, ordinary citizens such as you and I can offer great insights into how our countries are running. Perhaps, we can look for patterns of wasteful spending. Maybe we can create a mashup which sheds light on the convoluted inner workings of our government. Perhaps we can highlight some of the ways in which our government is or is not serving us well. This is great, as it allows you and I to use our skills to be better citizens and make the world a better place.

Many countries are participating in this effort. Below are just a few:
Canada
England
Sweeden by a private citizen
USA

Various not-for-profit agencies have popped up to further this initiative as well such as http://opendatasearch.org/,http://opengovernment.org/, and http://sunlightlabs.com/.

What type of data would you like to see from your government? What would you do with it?