• Advanced Filtering Unique Records (XP)

    Author
    Topic
    #401071

    I am using the advanced filter to sort our unique records in a multi thousand row spreadsheet. Is there any way to filter out the unique records and have the filter ignore one of the columns? I don’t want the “uniqueness” to be determined by the date field, but I need to keep the data from that field associated with the other columns. I have tried filtering on the list and using the other columns as the criteria info, but it is not sorting out the duplicates.

    Viewing 1 reply thread
    Author
    Replies
    • #787173

      But what if two records are identical but for the date field? If you ignore the date field, you will end up with one record. Which of the two dates should be used for this single record? Or do I misinterpret your question

      • #787175

        I just want one record. If the other 3 columns make the row unique, the date does not matter, but I still need to capture the date as it is associated to the rest of the row.

        • #787187

          To reask Hans’ question (since you didn’t answer it) with an example. In the table below the 3 non-date columns are identical They are only different in date. All 3 rows are unique. If you ignore the date they are all the same. If you want to find the unique items (ignoring the date) this list gives only 1.

          Date Name Letter Number
          01/01/2004 Bill A 1
          01/02/2004 Bill A 1
          01/03/2004 Bill A 1

          Which date should be associated with this 1 item?

          Steve

          • #787189

            Steve: I had answered, sorry you did not get. I have been using the most recent date, however, that is not as important as isolating the one record. These are trouble tickets, and they may be re-reported on different days, so multiple instances may occur. I only need one instance per unique record of the name/letter/number combination.

          • #787190

            Steve: I had answered, sorry you did not get. I have been using the most recent date, however, that is not as important as isolating the one record. These are trouble tickets, and they may be re-reported on different days, so multiple instances may occur. I only need one instance per unique record of the name/letter/number combination.

        • #787188

          To reask Hans’ question (since you didn’t answer it) with an example. In the table below the 3 non-date columns are identical They are only different in date. All 3 rows are unique. If you ignore the date they are all the same. If you want to find the unique items (ignoring the date) this list gives only 1.

          Date Name Letter Number
          01/01/2004 Bill A 1
          01/02/2004 Bill A 1
          01/03/2004 Bill A 1

          Which date should be associated with this 1 item?

          Steve

        • #787191

          Here is a small example table:

          Col1 Col2 Col3 Col4
          north widget red 02/11/04
          north widget red 02/18/04
          north widget blue 02/18/04

          If we ignore Col4 when deciding which rows are unique, we end up with this:

          Col1 Col2 Col3 Col4
          north widget red ???
          north widget blue 02/18/04

          What should be in the place of the question marks?

          • #787197

            I prefer the most recent date.

            • #787205

              The easiest way would be to use the advanced filter (without extracting the date) then in the date column use a formula to grab the most recent date for that unique occurence. It would have to be an Array or a user function. which ever is not as sluggish. Something like this ARRAY

              =min(if((Col1=Value1)*(Col2=Value2)*(Col3=Value3)*…*(Coln=Valuen),DateCol))

              Where Cols are the range of each col and Values are the values to compare.

              If it were too slow, you could just have a macro grab the unique entries (ignoring the date) and and then lookup the most recent date for it. Then you would call the macro rather than the adv filter routine.

              We would need a lot more info to write formulas or code anything.

              Steve

            • #787206

              The easiest way would be to use the advanced filter (without extracting the date) then in the date column use a formula to grab the most recent date for that unique occurence. It would have to be an Array or a user function. which ever is not as sluggish. Something like this ARRAY

              =min(if((Col1=Value1)*(Col2=Value2)*(Col3=Value3)*…*(Coln=Valuen),DateCol))

              Where Cols are the range of each col and Values are the values to compare.

              If it were too slow, you could just have a macro grab the unique entries (ignoring the date) and and then lookup the most recent date for it. Then you would call the macro rather than the adv filter routine.

              We would need a lot more info to write formulas or code anything.

              Steve

            • #787223

              Another idea is to use an autofilter, though it will require 2 additional columns. This method assumes that the lowest rows have the lowest dates. It picks the dates from the lowest rows of the “multiple unique” entries.

              The first column (could be hidden) would be a concatenation of all the columns that you want to be considered in the unique. I will assume it is in Col G and the data starts in row 2 (header in row 1) so G2 would be something like:
              =B2&C2&D2&E2&F2
              to find the unique entries in B-F (A has the date, it is not included)

              Then in H2:
              =countif($G$2:G2,G2)=1

              Copy G2:H2 down all the rows with data

              Add the autofilter: data – filter – autofilter. Then select col H dropdown and select TRUE. You will get the unique list of all the items in cols B-F with all other cols the values from the lowest rows. If you need to be lowest date it will have to be sorted by date first.

              Steve

            • #787341

              See the attachment. Rows 1 and 2 would not be unique even though the dates are different. Rows 3 and 4 would be unique. Right now, the advanced filtering gets me to this stage, but I have still having to go through 20K rows to see if there are any duplicates. I would like to filter out any rows that the only uniqueness is the date.

            • #787361

              The attached zipped workbook uses a macro and a function to sort the data and remove duplicates, keeping the most recent date. To see the effect, press Alt+F8 in the workbook, select FilterUnique and click Run. You should not have macro security set to high, for that disables macros completely. (I recommend setting it to medium.)

            • #787377

              This is great! Thanks!

            • #787385

              You could in fact run the macro without copying it, but here is how to copy it.

              Before you do so, make a backup copy of your workbook, so that you can fall back on that if something goes wrong!

              Open both the workbook I posted and your own workbook.
              Activate the Visual Basic Editor by typing Alt+F11.
              Make sure that the Project Explorer is visible (Ctrl+R). This looks like the left hand pane of Windows Explorer, but with Excel objects instead of drives and folders.
              If necessary, expand until you see VBAProject (FilterUnique.xls) | Modules | Module1.
              Drag Module1 to your own workbook and drop it. This will copy the module with the macro in it to your workbook.

              If you want to see the VBA code, double click Module1. You will see one macro (Sub FilterUnique) and one auxiliary function (Function CompareRows)
              If necessary, you can modify little bits of the code.

            • #787386

              You could in fact run the macro without copying it, but here is how to copy it.

              Before you do so, make a backup copy of your workbook, so that you can fall back on that if something goes wrong!

              Open both the workbook I posted and your own workbook.
              Activate the Visual Basic Editor by typing Alt+F11.
              Make sure that the Project Explorer is visible (Ctrl+R). This looks like the left hand pane of Windows Explorer, but with Excel objects instead of drives and folders.
              If necessary, expand until you see VBAProject (FilterUnique.xls) | Modules | Module1.
              Drag Module1 to your own workbook and drop it. This will copy the module with the macro in it to your workbook.

              If you want to see the VBA code, double click Module1. You will see one macro (Sub FilterUnique) and one auxiliary function (Function CompareRows)
              If necessary, you can modify little bits of the code.

            • #787378

              This is great! Thanks!

            • #787362

              The attached zipped workbook uses a macro and a function to sort the data and remove duplicates, keeping the most recent date. To see the effect, press Alt+F8 in the workbook, select FilterUnique and click Run. You should not have macro security set to high, for that disables macros completely. (I recommend setting it to medium.)

            • #787342

              See the attachment. Rows 1 and 2 would not be unique even though the dates are different. Rows 3 and 4 would be unique. Right now, the advanced filtering gets me to this stage, but I have still having to go through 20K rows to see if there are any duplicates. I would like to filter out any rows that the only uniqueness is the date.

            • #787224

              Another idea is to use an autofilter, though it will require 2 additional columns. This method assumes that the lowest rows have the lowest dates. It picks the dates from the lowest rows of the “multiple unique” entries.

              The first column (could be hidden) would be a concatenation of all the columns that you want to be considered in the unique. I will assume it is in Col G and the data starts in row 2 (header in row 1) so G2 would be something like:
              =B2&C2&D2&E2&F2
              to find the unique entries in B-F (A has the date, it is not included)

              Then in H2:
              =countif($G$2:G2,G2)=1

              Copy G2:H2 down all the rows with data

              Add the autofilter: data – filter – autofilter. Then select col H dropdown and select TRUE. You will get the unique list of all the items in cols B-F with all other cols the values from the lowest rows. If you need to be lowest date it will have to be sorted by date first.

              Steve

          • #787198

            I prefer the most recent date.

        • #787192

          Here is a small example table:

          Col1 Col2 Col3 Col4
          north widget red 02/11/04
          north widget red 02/18/04
          north widget blue 02/18/04

          If we ignore Col4 when deciding which rows are unique, we end up with this:

          Col1 Col2 Col3 Col4
          north widget red ???
          north widget blue 02/18/04

          What should be in the place of the question marks?

      • #787176

        I just want one record. If the other 3 columns make the row unique, the date does not matter, but I still need to capture the date as it is associated to the rest of the row.

    • #787174

      But what if two records are identical but for the date field? If you ignore the date field, you will end up with one record. Which of the two dates should be used for this single record? Or do I misinterpret your question

    Viewing 1 reply thread
    Reply To: Advanced Filtering Unique Records (XP)

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: