How to Take Out the Trash: Weeding Out Bad Data & Keeping It Out

By:


A good way to prevent data contamination is to set up a categorical, tiered system to aid in tracking data through the complete process and to an eventual end goal, with constant and consistent monitoring to make sure the data stays on track and setting outcomes to be measured in intervals.

These interval periods are further divided into three groupings: macro, micro, and sub-micro levels. If diagrammed, the appearance of this system resembles that of a tree: the macro level makes up the “trunk,” acting as the main portion to which all the “branches” (micro levels) are connected, with even smaller twigs (sub-micro levels) extending out from the branches. Also, much like a tree, the system may not only grow but also flourish and become beautiful, similar in look to a Mandelbrot set– if one can wax poetic about it.

Tracking data at the “trunk” macro level requires keeping the level tight and focused on a single subject. Adding too much information and too many outcomes at the macro level will make the micro and sub-micro levels virtually useless for analysis and could spread the information too thin, with infinitely complicated periphery that bares increasingly and more detailed ouroborian aspects. An example of an all-embracing macro level “trunk” is a social media campaign. Keeping the macro level tapered will allow for an easy gateway to interpret important baseline information to see if anything is amiss.

Beyond the macro level lays the “branches” micro level, which consists of segmented data from the macro level. Continuing with the example of a social media campaign macro level, it may be divided into micro levels based on the website or app used: Facebook, WhatsApp, Twitter, Instagram, Google+, YouTube, Snapchat, etc. The micro levels of a macro level may be whatever you wish them to be, provided they can effectively organize the more specific data and lend themselves to ease of access from the macro level to investigate anomalies (or allow in-depth investigation of anomalies on the macro level).

Beneath the micro levels lie the “twig” sub-micro levels, which are additionally specific; the sub-micro levels of the micro level of a social media platform could be distinctive, identifiable individual campaigns or pieces of media. Dividing the data even further allows for even easier access and more streamlined approaches, enabling those on the marketing and sales team to have an equal footing with those on the data analytics side and allowing both sides to combat any enigmas in the data. Using this type of a data integrity system to easily access the data sets allows the user to prevent or combat “garbage” from leaking in, and thereby mitigates it spilling out, where it can cause damage, saving users many headaches down the line.

Analyzing the Audience

Another helpful dissection may be the segmentation of the audience to better analyze response data from the client side. For example, an age range macro level could be divided into micro levels based on specific numerical ranges or psychographic segmentations such as shared personality traits, consumer beliefs, lifestyles or young adult (18-24 males or 25-34 females, etc.) and so on. Such data will be valuable to establish separate reports on distinct target audiences, allowing the user to dive into them to brainstorm or to work on a problem, solution or opportunity while knowing where all the information is and where to look to find it, much like the tiered levels of the corporate-side system.

Note that while this system in both instances should be able to allow more efficient data access and detection of anomalies, the system is still not perfect and may still require combing through a varying amount of information if incongruities or unpredictable events appear in the harvested data. Again, the most foolproof way to keep bad data from seeping into results is maintaining the initial commitment to data integrity and entry. Honesty and scrupulousness are still the best policy after all – and so is focus and being careful.

One last thing: Be wary of accidentally overanalyzing the data in the process (“paralysis by analysis”) Even if you think you may be able to get to the root of the problem if you look long enough, you may get overwhelmed and lose yourself in trying to process all the information at once. If you feel like you’re becoming swamped with facts and figures, remember to take a step back, breathe and relax.

You’ve now got a manageable system to work with at your fingertips.

Tagged:

Garbage In, Garbage Out: Why Bad Data is Worse Than No Data

By:


Since the 80s or 90s, computers have grown in importance not just in a personal sense but in a business one as well. While technology has made life easier, it’s still powered by man (for now!) and therefore is not entirely infallible. You simply can’t trust your insights when you can’t trust the inputs.

How does this concept relate to the education industry? Mainly, through hardware, sales software and analytical marketing tools: while the leap from sales binders to Excel spreadsheets may have made enrollment and sales data more streamlined and convenient, the results ultimately depend on the data inputted rather than the vehicle.

With human error occurring more than we want to admit, false or faulty data can still leak into a document or calculation and contaminate outcomes, resulting in misaligned marketing strategies, increased costs, and business instability. The problem becomes amplified when large and varied sets of big data need to be analyzed to help an organization make informed business decisions. This is the often a complex process of examining large and varied data sets to uncover information including mystifying arrays, undiscovered parallels, market developmental cycles and buyer biases that help administrations gain valuable insights, enhance decisions, and create new products. The relationship between bad input leading to bad output can be summarized by this phrase: garbage in, garbage out.

The evolution from Rolodex to a spreadsheet or even smartphone app has certainly streamlined collecting information, but it hasn’t entirely eliminated user error. Innovations in hardware and software have made it uncomplicated and cost effective to amass, stockpile, and evaluate copious amounts of sales and marketing data. If good information is input, then good data will be spat back out and vice versa, which may significantly affect planning, buying and selling decisions. In education marketing, user error makes it more difficult to know the client. In essence, bad data is as good as no data and perhaps even worse.

So, what can we do? While adherence to data integrity and entry along with correct set-up ensures the best and most accurate results, human error will always be a constant. Bad data input will always occur, but controlling for bad data, and engineering procedures to supervise data integrity successfully will help eliminate issues in decision making and avoid increased cost and organizational miscues. The best solution is to detect the ‘bad’ early and locate the problem before it gets worse. Fortunately, we can do something about data quality. No one wants to find out a pipe is clogged by the time their basement is flooded. Admitting that you have a data quality problem is the key to the solution.

Tune in to my next article to find out how segmenting data based on audience, system of controls, implementing a tiered tracking system and management oversight can help keep data on track. I’ll also provide an important warning about overanalyzing data that can save you great turmoil and stress.

Tagged: