Analyzing Open-Ended Questions in Survey

Open-ended questions can help your organization uncover a wealth of insights. The responses to open-ended questions are the most raw and genuine parts of survey analysis.  The questions are completely unassisted, and respondents can say or write anything that comes to mind. They are not limited to the selecting choices or guided in their response. With open-ended questions you get a true sense for how the respondents feel. Including these types of questions in a survey is a good way to get the true information.

The majority of surveys are filled with quantitative, close-ended questions. These types of questions provide clear, structured data that is easily analyzed. However, if you want a more comprehensive understanding of what your respondents are thinking then include open-ended questions in your survey. With open-ended questions, you can ask respondents to clarify, elaborate or make a suggestion and it offers rich insights into the “WHY” behind respondents’ answers. It can be a great source for qualitative data within a quantitative survey. 

The hardest part about including open-ended questions in your survey is analyzing responses. Open Ended questions coding is the process of taking the open-end responses and classifying them into groups.  Once coded, they can be analyzed in the similar way multiple response questions can be. 

open end q

Here are some tips for Open Ended questions coding:
  1. Read every response – This may take quite a long time, but it is worth it! You will really get to know the data. As you go through the responses, you will start to see some trends. Be sure to mark some quotes that resonate with you.
  2. Develop categories – Develop categories for the different trends that you see in the responses. Each response should go into at least one category. Sometimes there are multiple ideas expressed in a comment and therefore it may belong in multiple categories. This process is often referred to as “multi-coding”.
  3. Label each response with one or more coding categories – After you generate coding categories, assign at least one category to each response. This may be best done in an Excel sheet with responses in one column and coding category/categories in the next column.
  4. Review for major themes After you’ve coded your responses and refined your categories, you will need to review to see which of the categories have the most responses and, therefore, represent your major themes. Once you’ve done this, think about what the themes are really saying—it’s one thing to say “most people wanted more group activities”, but how will you explain this to others so that it will lead to program improvements?
  5. Identify patterns and trends – The next step is to see which categories are related and where patterns and trends can be identified. Are the themes related in some way, or are there a series of unrelated points being mentioned?
  6. Write-up your analysis – For your analysis to be useful, you will need to summarize it to be able to effectively communicate your findings to decision-makers. This would normally be in the form of descriptive text incorporating some of the comments that exemplify your major themes. If you also have quantitative data, your summary of themes may complement or clarify what you saw in the numbers, and your write-up can tie it all together.

The process of reviewing all open-ends questions and its coding has become an important standard in reports delivered by Data-Q Research and the results are always very rewarding for our clients.

Posted in Data Processing | Tagged , , , , | Leave a comment

Outliers are extreme data values that differ greatly from the majority of a set of data. These values fall outside of an overall trend that is present in the data. Outliers are one of those statistical issues that everyone knows about, but most people aren’t sure how to deal with.  Most descriptive/parametric statistics are highly sensitive to outliers, like means, median, standard deviations, and correlation coefficient.

Now, How to deal with the outliers? – Despite of all, Is it justifiable to drop all the outliers from the data? No, it’s important to investigate the nature of the outlier before deciding because sometimes they can be genuine observations.

We will look at these by exploring a few examples.

Case 1: If the outliers are due to incorrectly entered or measured data, you should drop the outlier because it will affect the results.

               For example, I once analyzed a data set in which a man’s weight was recorded as 24 lbs. We all know that was physically impossible. His true weight was probably 150, 170, or 220 lbs, but since I didn’t know which one, I dropped the outlier.

Case 2: If the outliers do affect assumptions but do not change the results you may drop the outlier. But note that in a footnote of your paper.

Neither the presence nor absence of both the outliers in the graph below would change the regression line:

outlier1

Case 3: If the outliers affect both results and assumptions then it is not appropriate to simply drop the outlier.  You must check by doing analysis both with and without it, but you should mention this in the footnote the dropping of any such data value and how the results changed.outlier3

Case 4: If the outlier creates a significant association, you should drop the outlier and should not report any significance from your analysis.

                      In the following graph, the relationship between X and Y is clearly created by the outlier.  Without it, there is no relationship between X and Y, so the regression coefficient does not truly describe the effect of X on Y.

outlier2

One should be careful while dealing with outliers and not mistake them for experimental errors or exceptions at all times. outliers can indicate a different property and may indicate that they belong to a different population.

Many times, outliers should be given special attention till their cause is known, which is not always random or chance. Therefore a study needs to be made before an outlier is discarded.

So in those cases where you shouldn’t drop the outlier, what do you do?

One option is to try a transformation.  Square root and log transformations both pull in high numbers.  This can make assumptions work better if the outlier is a dependent variable and can reduce the impact of a single point if the outlier is an independent variable.

Another option is to try a different model. This should be done with caution, but it may be that a non-linear model fits better.  For example, in case 3, perhaps an exponential curve fits the data with the outlier intact.

Whichever approach you take, you need to know your data and your research area well.  Try different approaches, and see which make theoretical sense.

Have questions about Outliers or want to learn more about it. Please write us at info@dataqresearch.com

Posted on by Puneet Grover | Leave a comment

Data Analysis – Cross Tabulation & It’s Benefits

In evaluating survey results, cross tabulation (cross-tabs) is the preliminary phase of a research which helps researchers in making initial insights into the data by investigating the relationship between two or more variables or measures how different variables related to each other.

Cross Tabulation enables researchers to extract the first cut of meaningful information. For instance, in a study Data-Q conducted, we found a correlation between how often people worry about specific political issues and how they voted in the election.

Cross Tabulations are simply data tables that present the results of the entire group of respondents as well as results from sub-groups of survey respondents. Cross tabulations enable you to examine relationships within the data that might not be readily apparent when analyzing total survey responses.

Cross Tabulation can also be useful in reviewing customer feedback. For example, a retail store could find that a strong majority of those who are dissatisfied with the store’s service purchased a specific product or worked with a specific employee during checkout. In either case, the store could quickly and easily remedy the issue.

Cross tabulations may also have specific data within each box, including:

  • Frequencies
  • Row percents
  • Column percents
  • Stats – (Mean, Median, Standard Deviation, Quartiles etc)
  • Significance testing (T-Test, Z-Test)
  • Chi-Square Test

The Benefits of using cross tabulations in survey analysis and all of these features make cross tabulation possible for even a novice researcher.

  • Little or no understanding of concepts are necessary for interpretation.
  • Readers can easily observe patterns of association.
  • Readers can also see if the pattern is weaker across some rows.
  • Can put either variable in rows or columns.
  • Very flexible – You can easily take the information from this cross tabulation and create a visual chart or graph Cross-tabs can be done with almost any variable
  • Accessible interpretation

cross tabs

Above example gives you a brief look at how you might use cross tabulation analysis for your own survey. You can analyze the frequency of visits and break the results down by age. The choices for the first question are displayed to the left (the row labels) of the table data. The second question choices are displayed across the top of the table (the column headings). his association can be flipped if needed.

Data-Q has the ability to cross analyze one question against a number of other questions, and produce presentation-quality tabulations. Data-Q’s data processing and tabulation services use the best of breed software packages to process, clean, manage and tabulate data.

Have questions about Cross Tabulation or want to learn more about advanced analysis with Data-Q Research: Please write us at info@dataqresearch.com

Posted in Data Processing | Tagged , , , , , | Leave a comment

Weighting Data : WHY, WHEN, HOW & A Few Cautions!!

Data weighting is a technique that is commonly used in market research. Many people reading this will already know what the concept means. If you’re not one of them, it refers to that “During a survey, it is not possible to interview everyone, so only a sample of the population is interviewed. If this sample group does not accurately reflect the proportions of various groups in the total population, then adjusting data results to either overcome sampling bias or to give more or less significance to factors based on their estimated relevance to the question at hand.

Weighting is most effective when you have reliable, precise information about what the actual numbers should look like. A common example, you are in the business of selling men clothing. You also know that males form 80% of your customer base and purchasing decision. If you field a survey, and your survey response has 50% male and 50% female – you have sample bias. The data is leaning towards females, who constitute 50% of the survey data, but only constitute 20% of your customer base. But what if you don’t know exactly what the overall population looks like? Any weights you assign will be guesses — educated guesses perhaps — but still subject to the possibility that your estimates are off, which will in turn affect the accuracy of the results.

Without getting heavily into the math involved, weighting to that extent increases the degree of error and the significance of the data, especially if the number of respondents involved is relatively small. Personally, I feel uneasy weighting anything by a factor of more than around 1.3 or 1.4.

Your data weighting proportions probably should look like this.

We try to address this issue by making sure that our sampling isn’t so skewed that such drastic weights are necessary. That approach doesn’t help when the data has already been collected, but it does eliminate the mindset of thinking it’s okay to be sloppy in the project design because you can “fix” the issues later with weighting.

Another issue with weighting is that, if you do it, you need to be prepared to justify and clearly communicate your assumptions to the client and potentially to any constituency they might be sharing the results with. For example, if you have conducted a public opinion survey for a community on issues of a sensitive nature (i.e. a potential school closing, allowing certain large-scale construction, etc.) people will scrutinize your methodology very closely.  If they learn that you have weighted the data to amplify the opinions of certain groups relative to others, controversy is likely to follow. In such cases, it will often be hard to convince people that the weighting is a valid analysis technique. They will see it as unfairly stacking the deck in one side’s favor, and if you haven’t done it properly, they’ll be right. That’s not to say you should never use weighting in that kind of project, just that you’ll really need to make sure you can thoroughly justify it – which is ultimately a good rule of thumb to use any time the question of whether or not to weight data arises.

How to calculate Weight Factor

Weighted Factor = Target Sample % / Actual Sample %

Have questions about weighting data in market research: Please write us at info@dataqresearch.com

Posted in Data Processing | Tagged , , , , | Leave a comment