How to Take Out the Trash: Weeding Out Bad Data & Keeping It Out


A good way to prevent data contamination is to set up a categorical, tiered system to aid in tracking data through the complete process and to an eventual end goal, with constant and consistent monitoring to make sure the data stays on track and setting outcomes to be measured in intervals.

These interval periods are further divided into three groupings: macro, micro, and sub-micro levels. If diagrammed, the appearance of this system resembles that of a tree: the macro level makes up the “trunk,” acting as the main portion to which all the “branches” (micro levels) are connected, with even smaller twigs (sub-micro levels) extending out from the branches. Also, much like a tree, the system may not only grow but also flourish and become beautiful, similar in look to a Mandelbrot set– if one can wax poetic about it.

Tracking data at the “trunk” macro level requires keeping the level tight and focused on a single subject. Adding too much information and too many outcomes at the macro level will make the micro and sub-micro levels virtually useless for analysis and could spread the information too thin, with infinitely complicated periphery that bares increasingly and more detailed ouroborian aspects. An example of an all-embracing macro level “trunk” is a social media campaign. Keeping the macro level tapered will allow for an easy gateway to interpret important baseline information to see if anything is amiss.

Beyond the macro level lays the “branches” micro level, which consists of segmented data from the macro level. Continuing with the example of a social media campaign macro level, it may be divided into micro levels based on the website or app used: Facebook, WhatsApp, Twitter, Instagram, Google+, YouTube, Snapchat, etc. The micro levels of a macro level may be whatever you wish them to be, provided they can effectively organize the more specific data and lend themselves to ease of access from the macro level to investigate anomalies (or allow in-depth investigation of anomalies on the macro level).

Beneath the micro levels lie the “twig” sub-micro levels, which are additionally specific; the sub-micro levels of the micro level of a social media platform could be distinctive, identifiable individual campaigns or pieces of media. Dividing the data even further allows for even easier access and more streamlined approaches, enabling those on the marketing and sales team to have an equal footing with those on the data analytics side and allowing both sides to combat any enigmas in the data. Using this type of a data integrity system to easily access the data sets allows the user to prevent or combat “garbage” from leaking in, and thereby mitigates it spilling out, where it can cause damage, saving users many headaches down the line.

Analyzing the Audience

Another helpful dissection may be the segmentation of the audience to better analyze response data from the client side. For example, an age range macro level could be divided into micro levels based on specific numerical ranges or psychographic segmentations such as shared personality traits, consumer beliefs, lifestyles or young adult (18-24 males or 25-34 females, etc.) and so on. Such data will be valuable to establish separate reports on distinct target audiences, allowing the user to dive into them to brainstorm or to work on a problem, solution or opportunity while knowing where all the information is and where to look to find it, much like the tiered levels of the corporate-side system.

Note that while this system in both instances should be able to allow more efficient data access and detection of anomalies, the system is still not perfect and may still require combing through a varying amount of information if incongruities or unpredictable events appear in the harvested data. Again, the most foolproof way to keep bad data from seeping into results is maintaining the initial commitment to data integrity and entry. Honesty and scrupulousness are still the best policy after all – and so is focus and being careful.

One last thing: Be wary of accidentally overanalyzing the data in the process (“paralysis by analysis”) Even if you think you may be able to get to the root of the problem if you look long enough, you may get overwhelmed and lose yourself in trying to process all the information at once. If you feel like you’re becoming swamped with facts and figures, remember to take a step back, breathe and relax.

You’ve now got a manageable system to work with at your fingertips.