Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Karl Burrows
 
Posts: n/a
Default Extract Unique Values, Then Extract Again to Remove Suffixes

I have a list of builders where I want to extract all the unique values.
Using =IF(COUNTIF($A$1:A1,A1)=1,A!,"") works fine to get the first set of
unique values, but I need to extract it a bit further. The builder names
may be Ryan 50, Ryan 60 or just Ryan. I need to combine those values to
come up with the final list of unique builders that just says Ryan. I'm not
sure if I can strip off anything from the end as some are Ryan - Greenbrier.
I thought maybe removing everything to the right of a blank space, but those
entries have 2 blanks.

Any help would be greatly appreciated! Thanks!


  #2   Report Post  
James Hamilton
 
Posts: n/a
Default

Try using the filter (data filter) and then in the drop down list, select
"custom" and enter the custom feature you're looking for eg. contains "ryan".

"Karl Burrows" wrote:

I have a list of builders where I want to extract all the unique values.
Using =IF(COUNTIF($A$1:A1,A1)=1,A!,"") works fine to get the first set of
unique values, but I need to extract it a bit further. The builder names
may be Ryan 50, Ryan 60 or just Ryan. I need to combine those values to
come up with the final list of unique builders that just says Ryan. I'm not
sure if I can strip off anything from the end as some are Ryan - Greenbrier.
I thought maybe removing everything to the right of a blank space, but those
entries have 2 blanks.

Any help would be greatly appreciated! Thanks!



  #3   Report Post  
Karl Burrows
 
Posts: n/a
Default

It can't use a filter, it has to be derived from a formula. It drives other
reporting. In addition, there are other builders, so they would have to be
filtered as well. I almost need something like reverse concatenation.

"James Hamilton" wrote in message
...
Try using the filter (data filter) and then in the drop down list, select
"custom" and enter the custom feature you're looking for eg. contains
"ryan".

"Karl Burrows" wrote:

I have a list of builders where I want to extract all the unique values.
Using =IF(COUNTIF($A$1:A1,A1)=1,A!,"") works fine to get the first set of
unique values, but I need to extract it a bit further. The builder names
may be Ryan 50, Ryan 60 or just Ryan. I need to combine those values to
come up with the final list of unique builders that just says Ryan. I'm
not
sure if I can strip off anything from the end as some are Ryan -
Greenbrier.
I thought maybe removing everything to the right of a blank space, but
those
entries have 2 blanks.

Any help would be greatly appreciated! Thanks!





  #4   Report Post  
 
Posts: n/a
Default

what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron

  #5   Report Post  
Karl Burrows
 
Posts: n/a
Default

It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

What I am doing is about impossible in Access, as it is pulling lot data
into a formatted report spread over a 6 year rolling period. I would still
have the same issue in Access as well if I were to query the data for these
values.

Thanks!

wrote in message
oups.com...
what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron




  #6   Report Post  
James Hamilton
 
Posts: n/a
Default

I note that there is a space between the unique information eg. "ryan" and
the other data on the end eg "townhomes"....so try this:

1. highlight the column with the data in it
2. Go to DATA TEXT TO COLUMNS, then make sure "delimited" is selected and
hit NEXT, then make sure that "SPACE" is checked, then hit NEXT, then FINISH.

Can you let me know if this works?

"Karl Burrows" wrote:

It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

What I am doing is about impossible in Access, as it is pulling lot data
into a formatted report spread over a 6 year rolling period. I would still
have the same issue in Access as well if I were to query the data for these
values.

Thanks!

wrote in message
oups.com...
what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron



  #7   Report Post  
Bob Phillips
 
Posts: n/a
Default

Karl,

You could use a helper column to get the sans number values

=SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2 ,ROW(INDIRECT("1:10")),1))
,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1)))) *1,"")

and then count uniques here. The formula is an array formula, so commit with
Ctrl-Shift-Enter.

--
HTH

Bob Phillips

"Karl Burrows" wrote in message
...
It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

What I am doing is about impossible in Access, as it is pulling lot data
into a formatted report spread over a 6 year rolling period. I would

still
have the same issue in Access as well if I were to query the data for

these
values.

Thanks!

wrote in message
oups.com...
what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron




  #8   Report Post  
Domenic
 
Posts: n/a
Default

Assuming that Column A contains your data, enter the following formula
that needs to be confirmed with CONTROL+SHIFT+ENTER in B1 and copy down:

=INDEX($D$1:$D$3,MATCH(TRUE,ISNUMBER(SEARCH($D$1:$ D$3,A1)),0))

....where D1:D3 contains the values to extract, such as Ryan, Mulvaney,
and KB Home.

Hope this helps!

In article ,
"Karl Burrows" wrote:

It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

What I am doing is about impossible in Access, as it is pulling lot data
into a formatted report spread over a 6 year rolling period. I would still
have the same issue in Access as well if I were to query the data for these
values.

Thanks!

wrote in message
oups.com...
what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron

  #9   Report Post  
Alan Beban
 
Posts: n/a
Default

Where does this assume the OP's data is, and in what row(s) of the
helper column is it to be array entered?

Alan Beban

Bob Phillips wrote:
Karl,

You could use a helper column to get the sans number values

=SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2 ,ROW(INDIRECT("1:10")),1))
,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1)))) *1,"")

and then count uniques here. The formula is an array formula, so commit with
Ctrl-Shift-Enter.

  #10   Report Post  
Karl Burrows
 
Posts: n/a
Default

I have tried text to columns, but it tries to split out the formula instead
of the value in the cell (even if I create a new cell and convert it to
text). Then, there are some builders that have several spaces in the name
(KB Home 60). I tried to use LEFT to pull out to the next blank space, but
that only pulled KB!

"James Hamilton" wrote in message
...
I note that there is a space between the unique information eg. "ryan" and
the other data on the end eg "townhomes"....so try this:

1. highlight the column with the data in it
2. Go to DATA TEXT TO COLUMNS, then make sure "delimited" is selected and
hit NEXT, then make sure that "SPACE" is checked, then hit NEXT, then
FINISH.

Can you let me know if this works?

"Karl Burrows" wrote:

It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

What I am doing is about impossible in Access, as it is pulling lot data
into a formatted report spread over a 6 year rolling period. I would
still
have the same issue in Access as well if I were to query the data for
these
values.

Thanks!

wrote in message
oups.com...
what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron







  #11   Report Post  
Alan Beban
 
Posts: n/a
Default

Well, perhaps I'm not being imaginative enough, but it is difficult for
me to conceive of a formula that will be able to treat Ryan Townhomes
like a duplicate of Ryan and not treat KB Home like a duplicate of KB.

Alan Beban

Karl Burrows wrote:
It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

What I am doing is about impossible in Access, as it is pulling lot data
into a formatted report spread over a 6 year rolling period. I would still
have the same issue in Access as well if I were to query the data for these
values.

Thanks!

wrote in message
oups.com...
what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron


  #12   Report Post  
Karl Burrows
 
Posts: n/a
Default

That is really close! It works on about 50% of the builder names.
Examples:

Works on:
Beazer Homes 50' & 60'
D.R. Horton 40 & 50
Ryan - Greenbrier
Ryan - Aldridge
Ryan Townhomes
Westminster Townhomes
KB Home 40 & 50

Does not work on:
C.P. Morgan 40' & 50'
Lennar Homes 40, 50, 55 & 60
McCar Homes & McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes

Should convert to:
C.P. Morgan
Lennar Homes
McCar Homes
Mulvaney

Strange how it works on some similar and not others. This is really good!
Thanks!

"Domenic" wrote in message
...
Assuming that Column A contains your data, enter the following formula
that needs to be confirmed with CONTROL+SHIFT+ENTER in B1 and copy down:

=INDEX($D$1:$D$3,MATCH(TRUE,ISNUMBER(SEARCH($D$1:$ D$3,A1)),0))

....where D1:D3 contains the values to extract, such as Ryan, Mulvaney,
and KB Home.

Hope this helps!

In article ,
"Karl Burrows" wrote:

It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

What I am doing is about impossible in Access, as it is pulling lot data
into a formatted report spread over a 6 year rolling period. I would
still
have the same issue in Access as well if I were to query the data for
these
values.

Thanks!

wrote in message
oups.com...
what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron



  #13   Report Post  
Karl Burrows
 
Posts: n/a
Default

That works in some values, but not others. Pulls KB Home as KB, Lennar
Homes as Lennar, Saussy Burbank as Saussy, D.R. Horton as D.R. It does work
on the ones that are only one name builders like Mulvaney, though.

This is a tough one!

"Bob Phillips" wrote in message
...
Karl,

You could use a helper column to get the sans number values

=SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2 ,ROW(INDIRECT("1:10")),1))
,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1)))) *1,"")

and then count uniques here. The formula is an array formula, so commit with
Ctrl-Shift-Enter.

--
HTH

Bob Phillips

"Karl Burrows" wrote in message
...
It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

What I am doing is about impossible in Access, as it is pulling lot data
into a formatted report spread over a 6 year rolling period. I would

still
have the same issue in Access as well if I were to query the data for

these
values.

Thanks!

wrote in message
oups.com...
what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron





  #14   Report Post  
Karl Burrows
 
Posts: n/a
Default

It is tough when builder names vary from one word names to initials and name
and the multiple word names.

"Alan Beban" wrote in message
...
Well, perhaps I'm not being imaginative enough, but it is difficult for
me to conceive of a formula that will be able to treat Ryan Townhomes
like a duplicate of Ryan and not treat KB Home like a duplicate of KB.

Alan Beban

Karl Burrows wrote:
It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

What I am doing is about impossible in Access, as it is pulling lot data
into a formatted report spread over a 6 year rolling period. I would
still
have the same issue in Access as well if I were to query the data for
these
values.

Thanks!

wrote in message
oups.com...
what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron




  #15   Report Post  
Harlan Grove
 
Posts: n/a
Default

Karl Burrows wrote...
It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

....

It's not too difficult to get Ryan and KB Home using the following udf,
which returns an array.


Function foo(r As Range) As Variant
Dim d As Object
Dim c As Variant, x As Variant, t As String

Set d = CreateObject("Scripting.Dictionary")

For Each c In r
t = c.Text
If d.Exists(t) Then d.Item(t) = d.Item(t) + 1 _
Else d.Add Key:=t, Item:=1
Next c

For Each c In d.Keys
For Each x In d.Keys
If x < c And x Like c & "*" Then d.Remove Key:=x
Next x
Next c

foo = Application.WorksheetFunction.Transpose(d.Keys)
End Function


But reducing out Mulvaney is much more difficult in general because
left substrings could be common to several distinct records, e.g.,

John Smith Builders
John Smith & Sons Construction

There may be approaches you could take using fuzzy string matching, but
you may find it expedient to use the udf above to filter out most
'duplicates', then remove the remaining ones manually.



  #16   Report Post  
Domenic
 
Posts: n/a
Default

If A2:A14 contains...

Beazer Homes 50' & 60'
D.R. Horton 40 & 50
Ryan - Greenbrier
Ryan - Aldridge
Ryan Townhomes
Westminster Townhomes
KB Home 40 & 50
C.P. Morgan 40' & 50'
Lennar Homes 40, 50, 55 & 60
McCar Homes & McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes

....and D2:D10 contains...

Beazer
C.P. Morgan
D.R. Horton
KB Home
Lennar
McCar
Mulvaney
Ryan
Westminster

....enter the following formula in B2, and copy down...

=INDEX($D$2:$D$10,MATCH(TRUE,ISNUMBER(SEARCH($D$2: $D$10,A2)),0))

....confirmed with CONTROL+SHIFT+ENTER. You should get the following
results...

Beazer
D.R. Horton
Ryan
Ryan
Ryan
Westminster
KB Home
C.P. Morgan
Lennar
McCar
Mulvaney
Mulvaney
Mulvaney

Then, if you want a unique list -- it only just dawned on me that this
is probably what you want :) -- you can either use 'Advanced Filter'
and check 'Unique records only' or use the following formula...

C2, copied down until you get #N/A:

=INDEX(B2:$B$14,MATCH(0,COUNTIF($C$1:C1,B2:$B$14), 0))

....confirmed with CONTROL+SHIFT+ENTER.

Hope this helps!

In article ,
"Karl Burrows" wrote:

That is really close! It works on about 50% of the builder names.
Examples:

Works on:
Beazer Homes 50' & 60'
D.R. Horton 40 & 50
Ryan - Greenbrier
Ryan - Aldridge
Ryan Townhomes
Westminster Townhomes
KB Home 40 & 50

Does not work on:
C.P. Morgan 40' & 50'
Lennar Homes 40, 50, 55 & 60
McCar Homes & McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes

Should convert to:
C.P. Morgan
Lennar Homes
McCar Homes
Mulvaney

Strange how it works on some similar and not others. This is really good!
Thanks!

  #17   Report Post  
Karl Burrows
 
Posts: n/a
Default

Problem is I don't have a column of named builders like you have in D2:D10.
Maybe this would be easier. Here is the exact list of builders they have.
This can also change as new subdivisions are added, but the list is
populated from an Access database. Builders may have more than one type of
lot in one subdivision or may have lots in multiple subdivisions, which is
what creates the need to identify builders this way for reporting. I don't
think this is going to be possible since Excel would have no way of knowing
that "Greenbrier" was not part of the builder name.

Beazer Homes
Beazer Homes 50'
Beazer Homes 60'
C.P. Morgan 40'
C.P. Morgan 50'
D.R. Horton
D.R. Horton
D.R. Horton 40
D.R. Horton 50
KB Home
KB Home 40
KB Home 50
Lennar Homes 40
Lennar Homes 50
Lennar Homes 55
Lennar Homes 60
McCar Homes
McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes
Pulte
Pulte
Ryan
Ryan - Aldridge
Ryan - Greenbrier
Ryan 60
Ryan 66
Ryan Townhomes
Ryan Townhomes
Saussy Burbank
Unsold Townhome Lots
Westminster
Westminster Townhomes


Thanks!

"Domenic" wrote in message
...
If A2:A14 contains...

Beazer Homes 50' & 60'
D.R. Horton 40 & 50
Ryan - Greenbrier
Ryan - Aldridge
Ryan Townhomes
Westminster Townhomes
KB Home 40 & 50
C.P. Morgan 40' & 50'
Lennar Homes 40, 50, 55 & 60
McCar Homes & McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes

....and D2:D10 contains...

Beazer
C.P. Morgan
D.R. Horton
KB Home
Lennar
McCar
Mulvaney
Ryan
Westminster

....enter the following formula in B2, and copy down...

=INDEX($D$2:$D$10,MATCH(TRUE,ISNUMBER(SEARCH($D$2: $D$10,A2)),0))

....confirmed with CONTROL+SHIFT+ENTER. You should get the following
results...

Beazer
D.R. Horton
Ryan
Ryan
Ryan
Westminster
KB Home
C.P. Morgan
Lennar
McCar
Mulvaney
Mulvaney
Mulvaney

Then, if you want a unique list -- it only just dawned on me that this
is probably what you want :) -- you can either use 'Advanced Filter'
and check 'Unique records only' or use the following formula...

C2, copied down until you get #N/A:

=INDEX(B2:$B$14,MATCH(0,COUNTIF($C$1:C1,B2:$B$14), 0))

....confirmed with CONTROL+SHIFT+ENTER.

Hope this helps!

In article ,
"Karl Burrows" wrote:

That is really close! It works on about 50% of the builder names.
Examples:

Works on:
Beazer Homes 50' & 60'
D.R. Horton 40 & 50
Ryan - Greenbrier
Ryan - Aldridge
Ryan Townhomes
Westminster Townhomes
KB Home 40 & 50

Does not work on:
C.P. Morgan 40' & 50'
Lennar Homes 40, 50, 55 & 60
McCar Homes & McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes

Should convert to:
C.P. Morgan
Lennar Homes
McCar Homes
Mulvaney

Strange how it works on some similar and not others. This is really good!
Thanks!



  #18   Report Post  
Domenic
 
Posts: n/a
Default

In article ,
"Karl Burrows" wrote:

Problem is I don't have a column of named builders like you have in D2:D10.


I should have said, 'enter a list of builders in D2:D10', as described.
So with the list of builders you've now provided, enter the following
list in D2:D13...

Beazer
C.P. Morgan
D.R. Horton
KB
Lennar
McCar
Mulvaney
Pulte
Ryan
Saussy
Unsold
Westminster

....and change the references in the formula accordingly. Does this help?
  #19   Report Post  
Alan Beban
 
Posts: n/a
Default

I don't know why you picked "Greenbrier" as a problem. Harlan Grove
provided code that would eliminate it when operating on the list below.
The real problem seems to be that your list of "builders names" does not
include all the names of the builders. In particular, it does not include

C.P. Morgan
Lennar Homes
McCar
Mulvaney

each of which appears from your descriptions to be a builder's name.
(I am assuming that "Unsold Townhome Lots" is equivalent to a builder's
name.)

Alan Beban

Karl Burrows wrote:
Problem is I don't have a column of named builders like you have in D2:D10.
Maybe this would be easier. Here is the exact list of builders they have.
This can also change as new subdivisions are added, but the list is
populated from an Access database. Builders may have more than one type of
lot in one subdivision or may have lots in multiple subdivisions, which is
what creates the need to identify builders this way for reporting. I don't
think this is going to be possible since Excel would have no way of knowing
that "Greenbrier" was not part of the builder name.

Beazer Homes
Beazer Homes 50'
Beazer Homes 60'
C.P. Morgan 40'
C.P. Morgan 50'
D.R. Horton
D.R. Horton
D.R. Horton 40
D.R. Horton 50
KB Home
KB Home 40
KB Home 50
Lennar Homes 40
Lennar Homes 50
Lennar Homes 55
Lennar Homes 60
McCar Homes
McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes
Pulte
Pulte
Ryan
Ryan - Aldridge
Ryan - Greenbrier
Ryan 60
Ryan 66
Ryan Townhomes
Ryan Townhomes
Saussy Burbank
Unsold Townhome Lots
Westminster
Westminster Townhomes


Thanks!

  #20   Report Post  
Karl Burrows
 
Posts: n/a
Default

I think we have gotten away from the actual problem. It doesn't matter who
the builder is (I only originally posted a partial list of builders), I want
to programmatically or through a formula remove all the uncommon information
from any builder that may be there at some point in time. Builders may have
a suffix to further define the lots they own in a neighborhood, so it
doesn't matter who is in my list now. I also don't want to create a list of
the builders to match to the list to create the unique values. I am doing
that already and want to eliminate that portion of the worksheet input.

It boils down to taking a column of builder names and stripping off the data
that is not common to all values for that builder, whether that means
removing 60', - 60, - Greenbrier, or any other designation they may come up
with. I think that is where the hang up is. Unless you can compare all the
builder values and then say well these are pretty much alike other than the
"60" at the end or "townhome" or other, so let's remove that portion. I
haven't had a chance to try to coding and maybe that is what it does.

Does all this make sense? Thanks for everyone's patience.

"Alan Beban" wrote in message
...
I don't know why you picked "Greenbrier" as a problem. Harlan Grove
provided code that would eliminate it when operating on the list below.
The real problem seems to be that your list of "builders names" does not
include all the names of the builders. In particular, it does not include

C.P. Morgan
Lennar Homes
McCar
Mulvaney

each of which appears from your descriptions to be a builder's name.
(I am assuming that "Unsold Townhome Lots" is equivalent to a builder's
name.)

Alan Beban

Karl Burrows wrote:
Problem is I don't have a column of named builders like you have in
D2:D10.
Maybe this would be easier. Here is the exact list of builders they have.
This can also change as new subdivisions are added, but the list is
populated from an Access database. Builders may have more than one type
of
lot in one subdivision or may have lots in multiple subdivisions, which is
what creates the need to identify builders this way for reporting. I
don't
think this is going to be possible since Excel would have no way of
knowing
that "Greenbrier" was not part of the builder name.

Beazer Homes
Beazer Homes 50'
Beazer Homes 60'
C.P. Morgan 40'
C.P. Morgan 50'
D.R. Horton
D.R. Horton
D.R. Horton 40
D.R. Horton 50
KB Home
KB Home 40
KB Home 50
Lennar Homes 40
Lennar Homes 50
Lennar Homes 55
Lennar Homes 60
McCar Homes
McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes
Pulte
Pulte
Ryan
Ryan - Aldridge
Ryan - Greenbrier
Ryan 60
Ryan 66
Ryan Townhomes
Ryan Townhomes
Saussy Burbank
Unsold Townhome Lots
Westminster
Westminster Townhomes


Thanks!





  #21   Report Post  
Harlan Grove
 
Posts: n/a
Default

Karl Burrows wrote...
....
It boils down to taking a column of builder names and stripping off the data
that is not common to all values for that builder, whether that means
removing 60', - 60, - Greenbrier, or any other designation they may come up
with. I think that is where the hang up is. Unless you can compare all the
builder values and then say well these are pretty much alike other than the
"60" at the end or "townhome" or other, so let's remove that portion. I
haven't had a chance to try to coding and maybe that is what it does.

....

The problem is that some of the qualifiers added to some of the builder
names could be legitimate parts of a person's or company's name. I'm
not saying that's in fact the case, just that it could be. For example,
Home and House can be surnames.

If the only added qualifiers you have to deal with involve anything
beginning with a decimal numeral or a hyphen, you could use regular
expressions to remove them. But you also have normal words appended
with no more than a space separating them from the builder name. Unless
*YOU* could compile an exhaustive list of such words that would always
be deleted and never erroneously truncate any builder's name, then you
could use a list of these words as tokens to remove from your records.
Then feed what's left through a dictionary object to eliminate
duplicates.

  #22   Report Post  
Karl Burrows
 
Posts: n/a
Default

Thus, the problem. Because each builder can have a unique name with some of
those qualifiers in there, it is about impossible to identify their true
name. I think maybe developing a naming convention to add the hyphen or
something else that we can tell it to strip everything right of that
character is going to be the only way to go. As it is now, it is truly
"fuzzy logic!"

Thanks for all the help!

"Harlan Grove" wrote in message
oups.com...
Karl Burrows wrote...
....
It boils down to taking a column of builder names and stripping off the
data
that is not common to all values for that builder, whether that means
removing 60', - 60, - Greenbrier, or any other designation they may come up
with. I think that is where the hang up is. Unless you can compare all
the
builder values and then say well these are pretty much alike other than the
"60" at the end or "townhome" or other, so let's remove that portion. I
haven't had a chance to try to coding and maybe that is what it does.

....

The problem is that some of the qualifiers added to some of the builder
names could be legitimate parts of a person's or company's name. I'm
not saying that's in fact the case, just that it could be. For example,
Home and House can be surnames.

If the only added qualifiers you have to deal with involve anything
beginning with a decimal numeral or a hyphen, you could use regular
expressions to remove them. But you also have normal words appended
with no more than a space separating them from the builder name. Unless
*YOU* could compile an exhaustive list of such words that would always
be deleted and never erroneously truncate any builder's name, then you
could use a list of these words as tokens to remove from your records.
Then feed what's left through a dictionary object to eliminate
duplicates.


  #23   Report Post  
Alan Beban
 
Posts: n/a
Default

Harlan Grove wrote:
Karl Burrows wrote...
...

It boils down to taking a column of builder names and stripping off the data
that is not common to all values for that builder, whether that means
removing 60', - 60, - Greenbrier, or any other designation they may come up
with. I think that is where the hang up is. Unless you can compare all the
builder values and then say well these are pretty much alike other than the
"60" at the end or "townhome" or other, so let's remove that portion. I
haven't had a chance to try to coding and maybe that is what it does.


...

The problem is that some of the qualifiers added to some of the builder
names could be legitimate parts of a person's or company's name. I'm
not saying that's in fact the case, just that it could be. . . .


Unless I misunderstand, we already know from the OP's 3rd posting in
this thread that it is the case. He made it clear that in Ryan
Townhomes, the builder's name is Ryan and Townhomes is a suffix; and
that in KB Home the builder's name is KB Home and Home is not a suffix.

Alan Beban
  #24   Report Post  
Harlan Grove
 
Posts: n/a
Default

"Alan Beban" wrote...
....
Unless I misunderstand, we already know from the OP's 3rd posting
in this thread that it is the case. He made it clear that in Ryan
Townhomes, the builder's name is Ryan and Townhomes is a suffix;
and that in KB Home the builder's name is KB Home and Home is not
a suffix.


Then it's the same situation as parsing surnames from a list of peoples from
many original nationalities but with inappropriate English capitalization
rules applied. E.g.,

Charles Der
Ruud Van Der Aalter
Nguyen Van Tieu

You come up with a rule to handle all these correctly, and you may have a
prayer handling general company names which follow even fewer rules. [This
is rhetorical. It's theoretically possible if there's a sufficiently
complete rules base, but it'd be expedient to use an approach that works
80-90% of the time and correct the rest manually.]


Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Populate a column by extracting unique values from another column? Mike Palmer Excel Worksheet Functions 2 June 10th 05 03:21 PM
Count Unique Values annie Excel Worksheet Functions 1 June 9th 05 07:19 AM
Extract AutoFilter Column Values? dwayneh Excel Discussion (Misc queries) 2 June 3rd 05 04:18 AM
Count number of Unique values Alan Excel Worksheet Functions 4 January 6th 05 08:05 PM
Formula to list unique values JaneC Excel Worksheet Functions 4 December 10th 04 12:25 AM


All times are GMT +1. The time now is 08:26 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"