Wednesday, July 1, 2009

Consolidating Spreadsheet Lists

I sometimes have to get a list of unique items either by removing duplicates or by pulling the items from two lists (by finding the elements missing from one of them). These are easy to figure when you have the data in a SQL database, but it's a little tougher using only spreadsheet functions. Below are two example approaches.

Update: I just learned about the COUNTIF function that might help; check it out first. Another possibility may be to "Paste Special..." and "Skip empty cells", but that doesn't always work. Another possibility would be to make a database out of the data and search for non-empty cells, but this has proven tricky (and it's many steps, anyway).

Find the Unique Items

This is like the SQL "SELECT DISTINCT word FROM words" (Update: this may not work when 2 cells are skipped!)
  1. Put the list in column A starting in row 1.
  2. In column B next to each value, paste a function which detects whether the item is a desired one; in this case, it's the last item in each run:

  3. In Column C, put a counter down to each desired element:

    =IF(B1; 0; C2+1)
  4. In column D and row 1, paste this function to tell where the first element lies:

    =C1 + 1
  5. In column D in the other rows, paste this function to tell where the next element lies:

    =D1 + 1 + INDIRECT(ADDRESS(D1+1; 2))
  6. In column E in each row, paste this function to tell each unique item in succession:


Find the Elements Missing (when compared to another list)

This is like the SQL "SELECT * FROM words WHERE word NOT IN ('forbidden', ...)"
  1. Paste the main list into column B and the sublist into column A.
  2. In column C, put the function to find the desired one (like step 2 above); in this case, VLOOKUP function to detect if the item from B exists in A:

    =ISERROR(VLOOKUP(B1; A$1:A$999; 1; 0))
  3. The rest of the steps are the same as steps 3-6 above. In column D, we count down to the desired element:

  4. In column E row 1, we put the row of the desired element:

    =D1 + 1
  5. In the rest of column E, we find each element:

    =E1 + 1+ INDIRECT(ADDRESS(E1+1;4))
  6. In column F, we find the value: