About Us

Auric__

Robert Baer wrote:

What is with this a*hole posting this sh*t here?

It's just spam. Ignore it.

--
If you would succeed,
you must reduce your strategy to its point of application.

GS[_6_]

I wish to read and parse every page of
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber="
where the page number goes from 5 to 999.
On each page, find "<a href="/NSN/5960 [it is longer, but that is
the start].
Given the full number (eg: <a href="/NSN/5960-00-831-8683"), open
a new related page "https://www.nsncenter.com/NSN/5960-00-831-8683"
and find the line ending "(MCRL)".
Read abut 4 lines to <a href="/PartNumber/ which is <a
href="/PartNumber/GV4S1400" in this case.
save/write that line plus the next three; close this secondary online
URL and step to next "<a href="/NSN/5960 to process the same way.
Continue to end of the page, close that URL and open the next
page.

Robert,
Here's what I have after parsing 'parent' pages for a list of its
links:

N/A
5960-00-503-9529
5960-00-504-8401
5960-01-035-3901
5960-01-029-2766
5960-00-617-4105
5960-00-729-5602
5960-00-826-1280
5960-00-754-5316
5960-00-962-5391
5960-00-944-4671

This is pg1 where the 1st link doesn't contain "5960" and so will be
ignored.

Each link's text is appended to this URL to bring up its 'child' pg:

https://www.nsncenter.com/NSN/

Each child page is parsed for the following 4 lines:

<TD style="VERTICAL-ALIGN: middle" align=center<A
href="/PartNumber/GV3S2800"GV3S2800</A</TD
<TD style="HEIGHT: 60px; WIDTH: 125px; VERTICAL-ALIGN: middle" noWrap
align=center  <A
href="/CAGE/63060"63060</A  </TD
<TD style="VERTICAL-ALIGN: middle" align=center  <A
href="/CAGE/63060"<IMG class=img-thumbnail
src="https://placehold.it/90x45?text=No%0DImage%0DYet" width=90
height=45</A  </TD
<TD style="VERTICAL-ALIGN: middle" text-align="center"<A title="CAGE
63060" href="/CAGE/63060"HEICO OHMITE LLC</A</TD

I'm stripping html syntax to get this data:

Line1: PartNumber/GV3S2800
Line2: CAGE/63060
Line3: https://placehold.it/90x45?text=No%0DImage%0DYet
Line4: HEICO OHMITE LLC

The output file has these filenames in the 1st line:

NSN Item#,Description,Part#,MCRL,CAGE,Source

I left the 3rd line URL out since, outside its host webpage, it'll be
useless to you. I need to know from you if the 3rd line URL is needed!

Otherwise, the output file will have 1 line per item so it can be used
as the db file "NSN_5960_ElectronTube.dat". I invite your suggestion
for filename...

I could extend the collected data to include...

Reference Number/DRN_3570
Entity Code/DRN_9250
Category Code/DRN_2910
Variation Code/DRN_4780

...where the fieldnames would then be:

Item#,Part#,MCRL,CAGE,Source,Ref,Entity,Category,V ariation

The 1st record will be:

5960-00-503-9529,GV3S2800,3302008,63060,HEICO OHMITE
LLC,DRN_3570,DRN_9250,DRN_2910,DRN_4780

Output file size for 1 parent pg is 1Kb; for 10 parent pgs is 10Kb. You
could have 1000 parent pgs of data stored in a 1Mb file.

Your feedback is appreciated...

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

GS[_6_]

Typos...

Robert,
Here's what I have after parsing 'parent' pages for a list of its
links:

N/A
5960-00-503-9529
5960-00-504-8401
5960-01-035-3901
5960-01-029-2766
5960-00-617-4105
5960-00-729-5602
5960-00-826-1280
5960-00-754-5316
5960-00-962-5391
5960-00-944-4671

This is pg1 where the 1st link doesn't contain "5960" and so will be
ignored.

Each link's text is appended to this URL to bring up its 'child' pg:

https://www.nsncenter.com/NSN/

Each child page is parsed for the following 4 lines:

<TD style="VERTICAL-ALIGN: middle" align=center<A
href="/PartNumber/GV3S2800"GV3S2800</A</TD
<TD style="HEIGHT: 60px; WIDTH: 125px; VERTICAL-ALIGN: middle"
noWrap align=center  <A
href="/CAGE/63060"63060</A  </TD
<TD style="VERTICAL-ALIGN: middle" align=center  <A
href="/CAGE/63060"<IMG class=img-thumbnail
src="https://placehold.it/90x45?text=No%0DImage%0DYet" width=90
height=45</A  </TD
<TD style="VERTICAL-ALIGN: middle" text-align="center"<A
title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</A</TD

I'm stripping html syntax to get this data:

Line1: PartNumber/GV3S2800
Line2: CAGE/63060
Line3: https://placehold.it/90x45?text=No%0DImage%0DYet
Line4: HEICO OHMITE LLC

The output file has these fieldnames in the 1st line:

NSN Item#,Description,Part#,MCRL,CAGE,Source

I left the 3rd line URL out since, outside its host webpage, it'll be
useless to you. I need to know from you if the 3rd line URL is
needed!

Otherwise, the output file will have 1 line per item so it can be
used as the db file "NSN_5960_ElectronTube.dat". I invite your
suggestion for filename...

I could extend the collected data to include...

Reference Number/DRN_3570
Entity Code/DRN_9250
Category Code/DRN_2910
Variation Code/DRN_4780

..where the fieldnames would then be:

Item#,Part#,MCRL,CAGE,Source,REF,ENT,CAT,VAR

The 1st record will be:

5960-00-503-9529,GV3S2800,3302008,63060,HEICO OHMITE
LLC,DRN_3570,DRN_9250,DRN_2910,DRN_4780

Output file size for 1 parent pg is 1Kb; for 10 parent pgs is 10Kb.
You could have 1000 parent pgs of data stored in a 1Mb file.

Your feedback is appreciated...

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Robert Baer

GS wrote:
I wish to read and parse every page of
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber="
where the page number goes from 5 to 999.
On each page, find "<a href="/NSN/5960 [it is longer, but that is the
start].
Given the full number (eg: <a href="/NSN/5960-00-831-8683"), open a
new related page "https://www.nsncenter.com/NSN/5960-00-831-8683" and
find the line ending "(MCRL)".
Read abut 4 lines to <a href="/PartNumber/ which is <a
href="/PartNumber/GV4S1400" in this case.
save/write that line plus the next three; close this secondary online
URL and step to next "<a href="/NSN/5960 to process the same way.
Continue to end of the page, close that URL and open the next page.

Robert,
Here's what I have after parsing 'parent' pages for a list of its links:

N/A
5960-00-503-9529
5960-00-504-8401
5960-01-035-3901
5960-01-029-2766
5960-00-617-4105
5960-00-729-5602
5960-00-826-1280
5960-00-754-5316
5960-00-962-5391
5960-00-944-4671

This is pg1 where the 1st link doesn't contain "5960" and so will be
ignored.

Each link's text is appended to this URL to bring up its 'child' pg:

https://www.nsncenter.com/NSN/

Each child page is parsed for the following 4 lines:

<TD style="VERTICAL-ALIGN: middle" align=center<A
href="/PartNumber/GV3S2800"GV3S2800</A</TD
<TD style="HEIGHT: 60px; WIDTH: 125px; VERTICAL-ALIGN: middle" noWrap
align=center  <A href="/CAGE/63060"63060</A  </TD
<TD style="VERTICAL-ALIGN: middle" align=center  <A
href="/CAGE/63060"<IMG class=img-thumbnail
src="https://placehold.it/90x45?text=No%0DImage%0DYet" width=90
height=45</A  </TD
<TD style="VERTICAL-ALIGN: middle" text-align="center"<A title="CAGE
63060" href="/CAGE/63060"HEICO OHMITE LLC</A</TD

I'm stripping html syntax to get this data:

Line1: PartNumber/GV3S2800
Line2: CAGE/63060
Line3: https://placehold.it/90x45?text=No%0DImage%0DYet
Line4: HEICO OHMITE LLC

The output file has these filenames in the 1st line:

NSN Item#,Description,Part#,MCRL,CAGE,Source

I left the 3rd line URL out since, outside its host webpage, it'll be
useless to you. I need to know from you if the 3rd line URL is needed!

Otherwise, the output file will have 1 line per item so it can be used
as the db file "NSN_5960_ElectronTube.dat". I invite your suggestion for
filename...

I could extend the collected data to include...

Reference Number/DRN_3570
Entity Code/DRN_9250
Category Code/DRN_2910
Variation Code/DRN_4780

..where the fieldnames would then be:

Item#,Part#,MCRL,CAGE,Source,Ref,Entity,Category,V ariation

The 1st record will be:

5960-00-503-9529,GV3S2800,3302008,63060,HEICO OHMITE
LLC,DRN_3570,DRN_9250,DRN_2910,DRN_4780

Output file size for 1 parent pg is 1Kb; for 10 parent pgs is 10Kb. You
could have 1000 parent pgs of data stored in a 1Mb file.

Your feedback is appreciated...

WOW!
Absolutely PERFECT!
You are correct, #1) do not need that line 3, and #2) do not need the
extended info.

File name(s) for PageNumber=1 I would use 5960_001.TXT,..to
PageNumber=999 I would use 5960_999.TXT and that would preserve order.
*OR*
Reading & parsing from PageNumber=1 to PageNumber=999,one could
append to same file (name NSN_5960.TXT); might as well - makes it easier
to pour into a single Excel file.
Either way is fine.

I have found a way to get rid of items that are not strictly electron
tubes and/or not regulators; that way you do not have to parse out these
"unfit" items from first page description. Use:
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator%20and%20%22ELECTRON%2 0TUBE%22&PageNumber=1"
Naturally, PageNumber still goes from 1 to 999.
Note the implied "(", ")" and " "; human-readable "5960 regulator and
(ELECTRON TUBE)".
As far as i can tell, using that shows no undesirable parts.

Thanks!
PS: i found WGET to be non-useful (a) it truncates the filename (b) it
buggers it to partial gibberish.

Robert Baer

GS wrote:
Typos...

Robert,
Here's what I have after parsing 'parent' pages for a list of its links:

N/A
5960-00-503-9529
5960-00-504-8401
5960-01-035-3901
5960-01-029-2766
5960-00-617-4105
5960-00-729-5602
5960-00-826-1280
5960-00-754-5316
5960-00-962-5391
5960-00-944-4671

This is pg1 where the 1st link doesn't contain "5960" and so will be
ignored.

Each link's text is appended to this URL to bring up its 'child' pg:

https://www.nsncenter.com/NSN/

Each child page is parsed for the following 4 lines:

<TD style="VERTICAL-ALIGN: middle" align=center<A
href="/PartNumber/GV3S2800"GV3S2800</A</TD
<TD style="HEIGHT: 60px; WIDTH: 125px; VERTICAL-ALIGN: middle" noWrap
align=center  <A href="/CAGE/63060"63060</A  </TD
<TD style="VERTICAL-ALIGN: middle" align=center  <A
href="/CAGE/63060"<IMG class=img-thumbnail
src="https://placehold.it/90x45?text=No%0DImage%0DYet" width=90
height=45</A  </TD
<TD style="VERTICAL-ALIGN: middle" text-align="center"<A title="CAGE
63060" href="/CAGE/63060"HEICO OHMITE LLC</A</TD

I'm stripping html syntax to get this data:

Line1: PartNumber/GV3S2800
Line2: CAGE/63060
Line3: https://placehold.it/90x45?text=No%0DImage%0DYet
Line4: HEICO OHMITE LLC

The output file has these fieldnames in the 1st line:

NSN Item#,Description,Part#,MCRL,CAGE,Source

I left the 3rd line URL out since, outside its host webpage, it'll be
useless to you. I need to know from you if the 3rd line URL is needed!

Otherwise, the output file will have 1 line per item so it can be used
as the db file "NSN_5960_ElectronTube.dat". I invite your suggestion
for filename...

I could extend the collected data to include...

Reference Number/DRN_3570
Entity Code/DRN_9250
Category Code/DRN_2910
Variation Code/DRN_4780

..where the fieldnames would then be:

Item#,Part#,MCRL,CAGE,Source,REF,ENT,CAT,VAR

The 1st record will be:

5960-00-503-9529,GV3S2800,3302008,63060,HEICO OHMITE
LLC,DRN_3570,DRN_9250,DRN_2910,DRN_4780

Output file size for 1 parent pg is 1Kb; for 10 parent pgs is 10Kb.
You could have 1000 parent pgs of data stored in a 1Mb file.

Your feedback is appreciated...

Like i said, PERFECT!
And you are correct, do not need line 3 nor the extended data.
Please check my other answer for a corrected search term which needs
a !corrected! human-readable version of the URL is:
https://www.nsncenter.com/NSNSearch?q=5960 regulator and "ELECTRON
TUBE"&PageNumber=1
The %20 is a virtual space, and the %22 is a virtual quote.
((guess that is the proper term))

GS[_6_]

You are correct, #1) do not need that line 3, and #2) do not need
the extended info.

Ok then, fieldnames will be: Item#,Part#,MCRL,CAGE,Source

File name(s) for PageNumber=1 I would use 5960_001.TXT,..to
PageNumber=999 I would use 5960_999.TXT and that would preserve
order.
*OR*
Reading & parsing from PageNumber=1 to PageNumber=999,one could
append to same file (name NSN_5960.TXT); might as well - makes it
easier to pour into a single Excel file.
Either way is fine.

Ok, then ouput filename will be: NSN_5960.txt

I have found a way to get rid of items that are not strictly
electron tubes and/or not regulators; that way you do not have to
parse out these "unfit" items from first page description. Use:
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator%20and%20%22ELECTRON%2 0TUBE%22&PageNumber=1"
Naturally, PageNumber still goes from 1 to 999.
Note the implied "(", ")" and " "; human-readable "5960 regulator
and (ELECTRON TUBE)".
As far as i can tell, using that shows no undesirable parts.

Works nice! Now I get 11 5960 items per parent page.

Thanks!
PS: i found WGET to be non-useful (a) it truncates the filename (b)
it buggers it to partial gibberish

What is WGET?

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

GS[_6_]

I still do not understand what magic you used.

I'm using the MS WebBrowser control and a textbox on a worksheet!

Now, the nitty-gritty; in exchange for that nicely parsed file,
what do i owe you?

A Timmies, straight up!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Robert Baer

GS wrote:
You are correct, #1) do not need that line 3, and #2) do not need the
extended info.

Ok then, fieldnames will be: Item#,Part#,MCRL,CAGE,Source

File name(s) for PageNumber=1 I would use 5960_001.TXT,..to
PageNumber=999 I would use 5960_999.TXT and that would preserve order.
*OR*
Reading & parsing from PageNumber=1 to PageNumber=999,one could append
to same file (name NSN_5960.TXT); might as well - makes it easier to
pour into a single Excel file.
Either way is fine.

Ok, then ouput filename will be: NSN_5960.txt

I have found a way to get rid of items that are not strictly electron
tubes and/or not regulators; that way you do not have to parse out
these "unfit" items from first page description. Use:
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator%20and%20%22ELECTRON%2 0TUBE%22&PageNumber=1"

Naturally, PageNumber still goes from 1 to 999.
Note the implied "(", ")" and " "; human-readable "5960 regulator and
(ELECTRON TUBE)".
As far as i can tell, using that shows no undesirable parts.

Works nice! Now I get 11 5960 items per parent page.

Thanks!
PS: i found WGET to be non-useful (a) it truncates the filename (b) it
buggers it to partial gibberish

What is WGET?

WGET is a command line program that will copy contents of an URL to
the hard drive; it has various options, for SSL, i think for some
processing, for giving the output file a specific name, for recursion, etc.
Was still trying to find ways to copy the online file to the hard drive.

I still do not understand what magic you used.

Now, the nitty-gritty; in exchange for that nicely parsed file, what
do i owe you?

Auric__

Robert Baer wrote:

PS: i found WGET to be non-useful (a) it truncates the filename (b) it
buggers it to partial gibberish.

Then you must be using a bad version, or perhaps have something wrong with
your .wgetrc. I've been using wget for around 10 years, and never had
anything like those issues unless I pass bad options.

--
My life is richer, somehow, simply because I know that he exists.

Robert Baer

GS wrote:
I still do not understand what magic you used.

I'm using the MS WebBrowser control and a textbox on a worksheet!

Now, the nitty-gritty; in exchange for that nicely parsed file, what
do i owe you?

A Timmies, straight up!

The search engine was not exactly forthcoming, to say the least;
everything including the kitchen sink but NOT anything alcoholic.
"Timmies drink" helped some; fifth "hit" down: "Timmy's Sweet and
Sour mix Cocktails and Drink Recipes".
Using "Timmies, straight up" was slightly better.."Average night at
the Manotick Timmies... : ottawa"

In all of this,a lot of "hits" mentioned something(always different)
about Tim Hortons Franchise.

Absolutely no clue regarding rum, scotch, vodka (or dare i say) milk.

Robert Baer

Auric__ wrote:
Robert Baer wrote:

PS: i found WGET to be non-useful (a) it truncates the filename (b) it
buggers it to partial gibberish.

Then you must be using a bad version, or perhaps have something wrong with
your .wgetrc. I've been using wget for around 10 years, and never had
anything like those issues unless I pass bad options.

Know nothing about .wgetrc; am in Win2K cmd line, and the batch file
used is:
H:
CD\Win2K_WORK\OIL4LESS\LLCDOCS\FED app\FBA stuff

wget --no-check-certificate --output-document=5960_002.TXT
--output-file=log002.TXT
https://www.nsncenter.com/NSNSearch?...2&PageNumber=2

PAUSE

The SourceForge site offered a Zip which was supposed to be complete,
but none of the created folders had an EXE (tried Win2K, WinXP, Win7).
Found SofTonic offering only a plain jane wget.exe, which i am using,
so that may be a buggered version.
Suggestions?

Robert Baer

Auric__ wrote:
Robert Baer wrote:

PS: i found WGET to be non-useful (a) it truncates the filename (b) it
buggers it to partial gibberish.

Then you must be using a bad version, or perhaps have something wrong with
your .wgetrc. I've been using wget for around 10 years, and never had
anything like those issues unless I pass bad options.

I also tried versions 1.18 and 1.13 from
https://eternallybored.org/misc/wget/.
Exactly the same truncation and gibberish.
At least, the 1.13 ZIP had wgetrc in the /etc folder; perhaps one
step forward.
No nobody said where to put the folder set, and certainly nothing
about set path, which just maybe perhaps might be useful for operation.

everonvietnam2016

sáº£n pháº©m tá»‘t, giÃ¡ ráº», cháº¥t lÆ°á»£ng, an toÃ*n cho ngÆ°á»i dÃ¹ng, dá»‹ch vá»¥ tuyá»‡t tráº§n, lÃºc nÃ*o cÃ³ Ä‘iá»u kiá»‡n qua á»§ng há»™ nhÃ©. chÃºc cá»*a hÃ*ng lÃ*m Äƒn hiá»‡u quáº£

GS[_6_]

GS wrote:
I still do not understand what magic you used.

I'm using the MS WebBrowser control and a textbox on a worksheet!

Now, the nitty-gritty; in exchange for that nicely parsed file,
what
do i owe you?

A Timmies, straight up!

The search engine was not exactly forthcoming, to say the least;
everything including the kitchen sink but NOT anything alcoholic.
"Timmies drink" helped some; fifth "hit" down: "Timmy's Sweet and
Sour mix Cocktails and Drink Recipes".
Using "Timmies, straight up" was slightly better.."Average night
at the Manotick Timmies... : ottawa"

In all of this,a lot of "hits" mentioned something(always
different) about Tim Hortons Franchise.

Absolutely no clue regarding rum, scotch, vodka (or dare i say)
milk.

Ha-ha! Ok.., 'Timmies' is fan-speak for Tim Horton's coffee!<g

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Robert Baer

everonvietnam2016 wrote:
sáº£n pháº©m tá»‘t, giÃ¡ ráº», cháº¥t lÆ°á»£ng, an toÃ*n cho ngÆ°á»i
dÃ¹ng, dá»‹ch vá»¥ tuyá»‡t tráº§n, lÃºc nÃ*o cÃ³ Ä‘iá»u kiá»‡n qua
á»§ng há»™ nhÃ©. chÃºc cá»*a hÃ*ng lÃ*m Äƒn hiá»‡u quáº£

No kapish.
Firstly, on my computer, all i see is a strange mix of characters
from the 512 ASCII set.
Secondly, i would not be able to read or understand your language
even if it was elegantly rendered.

Robert Baer

GS wrote:
GS wrote:
I still do not understand what magic you used.

I'm using the MS WebBrowser control and a textbox on a worksheet!

Now, the nitty-gritty; in exchange for that nicely parsed file, what
do i owe you?

A Timmies, straight up!

The search engine was not exactly forthcoming, to say the least;
everything including the kitchen sink but NOT anything alcoholic.
"Timmies drink" helped some; fifth "hit" down: "Timmy's Sweet and Sour
mix Cocktails and Drink Recipes".
Using "Timmies, straight up" was slightly better.."Average night at
the Manotick Timmies... : ottawa"

In all of this,a lot of "hits" mentioned something(always different)
about Tim Hortons Franchise.

Absolutely no clue regarding rum, scotch, vodka (or dare i say) milk.

Ha-ha! Ok.., 'Timmies' is fan-speak for Tim Horton's coffee!<g

What i did WRT Wget, was uninstall it and checked that there were
'dregs' on the HD.
Then i installed it from scratch, allowing all of the defaults.
Finally, i modified the system path (shortened version):
%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\Sy stem32\Wbem;C:\Program
Files\GnuWin32;

No joy.
Even at the root, the system insists wget does not exist (as an
executable, etc).

Auric__

Robert Baer wrote:

everonvietnam2016 wrote:
sáº£n pháº©m tá»‘t, giÃ¡ ráº», cháº¥t lÆ°á»£ng, an toÃ*n cho ngÆ°á»i
dÃ¹ng, dá»‹ch vá»¥ tuyá»‡t tráº§n, lÃºc nÃ*o cÃ³ Ä‘iá»u kiá»‡n qua
á»§ng há»™ nhÃ©. chÃºc cá»*a hÃ*ng lÃ*m Äƒn hiá»‡u quáº£

No kapish.
Firstly, on my computer, all i see is a strange mix of characters
from the 512 ASCII set.
Secondly, i would not be able to read or understand your language
even if it was elegantly rendered.

It's Vietnamese. *Lots* of accented characters. Also, it's spam.

--
Who says life is sacred? God? Hey, if you read your history,
God is one of the leading causes of death.
-- George Carlin

Auric__

Robert Baer wrote:

Auric__ wrote:
Robert Baer wrote:

PS: i found WGET to be non-useful (a) it truncates the filename (b) it
buggers it to partial gibberish.

Then you must be using a bad version, or perhaps have something wrong
with your .wgetrc. I've been using wget for around 10 years, and never
had anything like those issues unless I pass bad options.

Know nothing about .wgetrc;

Don't worry about it. It can be used to set default behaviors but every entry
can be replicated via switches.

am in Win2K cmd line, and the batch file
used is:
H:
CD\Win2K_WORK\OIL4LESS\LLCDOCS\FED app\FBA stuff

wget --no-check-certificate --output-document=5960_002.TXT
--output-file=log002.TXT
https://www.nsncenter.com/NSNSearch?...d%20%22ELECTRO
N%20TUBE%22&PageNumber=2

That wget line performs as expected for me: 5960_002.TXT contains valid HTML
(although I haven't made any attempt to check the data; it looks like most of
the page is CSS) and log002.TXT is a typical wget log of a successful
transfer.

As for truncating the filenames, if I remove the --output-document switch,
the filename I get is

NSNSearch@q=5960%20regulator%20and%20%22ELECTRON%2 0TUBE%22&PageNumber=2

PAUSE

The SourceForge site offered a Zip which was supposed to be complete,

If you're talking about GNUwin32, that version is years out of date.

but none of the created folders had an EXE (tried Win2K, WinXP, Win7).
Found SofTonic offering only a plain jane wget.exe, which i am using,
so that may be a buggered version.

Never even heard of them.

Suggestions?

I'm using 1.16.3. No idea where I got it.

The batch file that I use for downloading looks like this:

call wget --no-check-certificate -x -c -e robots=off -i new.txt %*

-x Always create directories (e.g. http://a.b.c/1/2.txt - .\a.b.c\1\2.txt).
-c Continue interrupted downloads.
-e Do this .wgetrc thing (in this case, ignore the robots.txt file).
-i Read list of filenames from the following file ("new.txt" because that's
the default name for a new file in my file manager).

I use the -i switch so I don't have to worry about escaping characters or %
vs %%. Whatever's in the text file is exactly what it looks for. (If you go
this route, it's one file per line.)

--
We have to stop letting George Lucas name our politicians.

Robert Baer

Auric__ wrote:
Robert Baer wrote:

Auric__ wrote:
Robert Baer wrote:

PS: i found WGET to be non-useful (a) it truncates the filename (b) it
buggers it to partial gibberish.

Then you must be using a bad version, or perhaps have something wrong
with your .wgetrc. I've been using wget for around 10 years, and never
had anything like those issues unless I pass bad options.

Know nothing about .wgetrc;

Don't worry about it. It can be used to set default behaviors but every entry
can be replicated via switches.

am in Win2K cmd line, and the batch file
used is:
H:
CD\Win2K_WORK\OIL4LESS\LLCDOCS\FED app\FBA stuff

wget --no-check-certificate --output-document=5960_002.TXT
--output-file=log002.TXT
https://www.nsncenter.com/NSNSearch?...d%20%22ELECTRO
N%20TUBE%22&PageNumber=2

That wget line performs as expected for me: 5960_002.TXT contains valid HTML
(although I haven't made any attempt to check the data; it looks like most of
the page is CSS) and log002.TXT is a typical wget log of a successful
transfer.

As for truncating the filenames, if I remove the --output-document switch,
the filename I get is

NSNSearch@q=5960%20regulator%20and%20%22ELECTRON%2 0TUBE%22&PageNumber=2

PAUSE

The SourceForge site offered a Zip which was supposed to be complete,

If you're talking about GNUwin32, that version is years out of date.

but none of the created folders had an EXE (tried Win2K, WinXP, Win7).
Found SofTonic offering only a plain jane wget.exe, which i am using,
so that may be a buggered version.

Never even heard of them.

Suggestions?

I'm using 1.16.3. No idea where I got it.

The batch file that I use for downloading looks like this:

call wget --no-check-certificate -x -c -e robots=off -i new.txt %*

-x Always create directories (e.g. http://a.b.c/1/2.txt - .\a.b.c\1\2.txt).
-c Continue interrupted downloads.
-e Do this .wgetrc thing (in this case, ignore the robots.txt file).
-i Read list of filenames from the following file ("new.txt" because that's
the default name for a new file in my file manager).

I use the -i switch so I don't have to worry about escaping characters or %
vs %%. Whatever's in the text file is exactly what it looks for. (If you go
this route, it's one file per line.)

You must have a different version of Wget; whatever i do on the
command line,including the "trick" of restrict-file-names=nocontrol, i
get a buggered path name plus the response &PageNumber not recognized.
Exactly same results in Win2K, WinXP or in Win7.

Yes, i used GNUwin32 as SourceForge "complete" of Wget had no EXE.
Is there some other (compiled, complete) source i should get?

Robert Baer

Auric__ wrote:
Robert Baer wrote:

everonvietnam2016 wrote:
sáº£n pháº©m tá»‘t, giÃ¡ ráº», cháº¥t lÆ°á»£ng, an toÃ*n cho ngÆ°á»i
dÃ¹ng, dá»‹ch vá»¥ tuyá»‡t tráº§n, lÃºc nÃ*o cÃ³ Ä‘iá»u kiá»‡n qua
á»§ng há»™ nhÃ©. chÃºc cá»*a hÃ*ng lÃ*m Äƒn hiá»‡u quáº£

No kapish.
Firstly, on my computer, all i see is a strange mix of characters
from the 512 ASCII set.
Secondly, i would not be able to read or understand your language
even if it was elegantly rendered.

It's Vietnamese. *Lots* of accented characters. Also, it's spam.

Yes, it was easy to recognize that was Vietnamese.
How the heck did you figure out that it was spam?

everonvietnam2016

sáº£n pháº©m tá»‘t, giÃ¡ ráº», cháº¥t lÆ°á»£ng, an toÃ*n cho ngÆ°á»i sá»* dá»¥ng, dá»‹ch vá»¥ rÃ¡o trá»i, lÃºc nÃ*o cÃ³ Ä‘iá»u kiá»‡n qua á»§ng há»™ nhÃ©. chÃºc cá»*a hÃ*ng lÃ*m Äƒn hiá»‡u quáº£

Auric__

Robert Baer wrote:

Auric__ wrote:
Robert Baer wrote:
[snip]
CD\Win2K_WORK\OIL4LESS\LLCDOCS\FED app\FBA stuff

wget --no-check-certificate --output-document=5960_002.TXT
--output-file=log002.TXT
https://www.nsncenter.com/NSNSearch?...and%20%22ELECT
RO N%20TUBE%22&PageNumber=2

That wget line performs as expected for me: 5960_002.TXT contains valid
HTML (although I haven't made any attempt to check the data; it looks
like most of the page is CSS) and log002.TXT is a typical wget log of a
successful transfer.

As for truncating the filenames, if I remove the --output-document
switch, the filename I get is

NSNSearch@q=5960%20regulator%20and%20%22ELECTRON%2 0TUBE%22&PageNumber=2
[snip]
If you're talking about GNUwin32, that version is years out of date.

You must have a different version of Wget; whatever i do on the
command line,including the "trick" of restrict-file-names=nocontrol, i
get a buggered path name plus the response &PageNumber not recognized.
Exactly same results in Win2K, WinXP or in Win7.

Hmm. Well... it could be that your copy of wget was compiled with old path
length limits (260 characters). I suppose the best thing to do there is to
try a different copy.

Yes, i used GNUwin32 as SourceForge "complete" of Wget had no EXE.
Is there some other (compiled, complete) source i should get?

Just google "wget windows" (without quotes) and start poking around.
Download a few different versions and see if any of them work for you.

--
Stupid railroad plot.

Auric__

Robert Baer wrote:

Auric__ wrote:
Robert Baer wrote:

everonvietnam2016 wrote:
sáº£n pháº©m tá»‘t, giÃ¡ ráº», cháº¥t lÆ°á»£ng, an toÃ*n cho ngÆ°á»i
dÃ¹ng, dá»‹ch vá»¥ tuyá»‡t tráº§n, lÃºc nÃ*o cÃ³ Ä‘iá»u kiá»‡n qua
á»§ng há»™ nhÃ©. chÃºc cá»*a hÃ*ng lÃ*m Äƒn hiá»‡u quáº£

No kapish.
Firstly, on my computer, all i see is a strange mix of characters
from the 512 ASCII set.
Secondly, i would not be able to read or understand your language
even if it was elegantly rendered.

It's Vietnamese. *Lots* of accented characters. Also, it's spam.

Yes, it was easy to recognize that was Vietnamese.
How the heck did you figure out that it was spam?

Well, it's Vietnamese in an almost-entirely English group, replying to a
thread that's entirely in English.

Also, courtesy of Google translate:

Is good, cheap, quality, seated * n for users, divine service, at nÃ*o
conditional support through offline. Wish c »* a hÃ * ng lÃ * m efficiently

The text is buggered on my end so I can't get a complete translation, but you
can see that it's meant to advertise *something*. The post immediately
preceding it was also Vietnamese spam, complete with a link. (If you didn't
see it, don't worry about it.)

Also, it's deleted from the Google archive. That's a pretty good sign right
there.

--
The only thing your dreams will land you is dead in a ditch.

GS[_6_]

Currently, it's ready to fully automate, but seems to have a snag
writing past the 1st parent page's child pages.

Turns out the problem was code not waiting for the browser not busy. I
switched to using URLDownloadToFile() at this point because it's orders
of magnitude faster. Using the browser/twxtbox on a sheet served well
for getting the process code nailed down, but that was only a temp
situation during dev.

The links error out about mid pg7 thru pg10 as I time tested only pgs
1thru10: -this took 50.89 secs!

I'll do some housekeeping of the code and post a download link to the
file...

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

GS[_6_]

Excel macros are SO... undocumented.
Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)

Thanks.

Look here...

https://app.box.com/s/23yqum8auvzx17h04u4f

...for *ParseWebPages.zip*, which contains:

ParseWebPages.xls
NSN_5960.txt
(Blank data file with fieldnames only in line1
NSN_5960_Test.txt
(Results for 1st 20 pages)

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

GS[_6_]

"Problem" with Excel, is that there are MANY ways to get what is
needed, and there is NO WAY of discovering _any_ of them; the "help"
document is worse than useless in that manner.
I have found that URLDownloadToFile() to be non-functional for
https
sources.

I disagree because it's working in my project I posted the download
for!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

GS[_6_]

GS wrote:
Excel macros are SO... undocumented.
Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)

Thanks.

Look here...

https://app.box.com/s/23yqum8auvzx17h04u4f

..for *ParseWebPages.zip*, which contains:

ParseWebPages.xls
NSN_5960.txt
(Blank data file with fieldnames only in line1
NSN_5960_Test.txt
(Results for 1st 20 pages)

I did not even try cURL as the explanation was just too dern
complicated.
Fiddled in Excel,as it has so many different ways to do something
specific.

So, this is skeleton of what i have:
Workbooks.Open Filename:=openFYL$ 'opens as R/O, no HD space taken

then..
With Worksheets(1)
' .Copy ''do not need; saves BOOK space
.SaveAs sav$ 'do not know how to close when done
' above creates the file described; that takes HD space, about
300K
End With

IMMEDIATELY after the "End With", a folder is created with useless
metadata info; do not know how to close when done.

WARNING: Scheme works only in XP and Win7.
If in XP, at about 150 files,one gets a PHONY "HD is full" warning
and one must exit Excel so as to be able to delete processed (and so
unwanted) files.
I say PHONY because the system showed NO CHANGE in HD free space,
never mind those files take about 500MB.

Furthermore, in Win7, these files show up in a folder the system
KNOWS NOTHING ABOUT..Windows Explorer does not show C:\Documents
which IS accessible; C:\<sysname\MY Documents is shown and CANNOT be
accessed.
Instead of the Excel program crashing, the system is shut down and
locked.
YET other reasons I hate Win7.

I don't follow what you're talking about here! What does it have to do
with the download I linked to?

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Robert Baer

GS wrote:
Currently, it's ready to fully automate, but seems to have a snag
writing past the 1st parent page's child pages.

Turns out the problem was code not waiting for the browser not busy. I
switched to using URLDownloadToFile() at this point because it's orders
of magnitude faster. Using the browser/twxtbox on a sheet served well
for getting the process code nailed down, but that was only a temp
situation during dev.
"Problem" with Excel, is that there are MANY ways to get what is
needed, and there is NO WAY of discovering _any_ of them; the "help"
document is worse than useless in that manner.
I have found that URLDownloadToFile() to be non-functional for https
sources.

The links error out about mid pg7 thru pg10 as I time tested only pgs
1thru10: -this took 50.89 secs!

I'll do some housekeeping of the code and post a download link to the
file...

Robert Baer

GS wrote:
Excel macros are SO... undocumented.
Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)

Thanks.

Look here...

https://app.box.com/s/23yqum8auvzx17h04u4f

..for *ParseWebPages.zip*, which contains:

ParseWebPages.xls
NSN_5960.txt
(Blank data file with fieldnames only in line1
NSN_5960_Test.txt
(Results for 1st 20 pages)

I did not even try cURL as the explanation was just too dern complicated.
Fiddled in Excel,as it has so many different ways to do something
specific.

So, this is skeleton of what i have:
Workbooks.Open Filename:=openFYL$ 'opens as R/O, no HD space taken

then..
With Worksheets(1)
' .Copy ''do not need; saves BOOK space
.SaveAs sav$ 'do not know how to close when done
' above creates the file described; that takes HD space, about 300K
End With

IMMEDIATELY after the "End With", a folder is created with useless
metadata info; do not know how to close when done.

WARNING: Scheme works only in XP and Win7.
If in XP, at about 150 files,one gets a PHONY "HD is full" warning
and one must exit Excel so as to be able to delete processed (and so
unwanted) files.
I say PHONY because the system showed NO CHANGE in HD free space,
never mind those files take about 500MB.

Furthermore, in Win7, these files show up in a folder the system
KNOWS NOTHING ABOUT..Windows Explorer does not show C:\Documents which
IS accessible; C:\<sysname\MY Documents is shown and CANNOT be accessed.
Instead of the Excel program crashing, the system is shut down and
locked.
YET other reasons I hate Win7.

GS[_6_]

If you're referring to the substitute 'page error' text put in place of
missing item info, ..well that might be misleading you. Fact is,
starting with item7 on pg7 there is no item info on any pages I checked
manually in the browser (up to pg100). Perhaps you could rephrase that
to "No Data Available"!?

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

GS[_6_]

GS wrote:
Excel macros are SO... undocumented.
Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)

Thanks.

Look here...

https://app.box.com/s/23yqum8auvzx17h04u4f

..for *ParseWebPages.zip*, which contains:

ParseWebPages.xls
NSN_5960.txt
(Blank data file with fieldnames only in line1
NSN_5960_Test.txt
(Results for 1st 20 pages)

Holy S*!
I did about 30 pages by hand; quit as rather tiresome and total
pages unknown (MORE than 999).
Never saw the fail you saw.

Difference is that you used the word "and"; technically (i think)
that should not affect results.
Also, you got items I am interested in, and after processing 503
pages, i did NOT get those.

In both cases, there were a lot of duplicate records (government
data, what else can you expect?).

In your sample, there were 73 useful records containing 43 unique
records. There may be some that i am not interested in, but there
definitely ARE those i did not find that i am interested in.

You could 'dump' the file into a worksheet and filter out the dupes
easily enough.

In my sample, there were 3782 unique records,and (better sit
down), only 15 were interesting. Crappy odds.

Hopefully, when i call them, someone that has some experience and
knowledge of how their sort criteria works, will answer the phone.
Last time i called,i got a new guy; no help other than "use
Google".

Those are dynamic web pages and so are database driven. Surely there's
a repository database for this info somewhere other than NSN?

You have done a masterful job!
Label it !DONE! please.

Thanks a lot.

Happy to be of help; -I found the project rather interesting!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Robert Baer

GS wrote:
Excel macros are SO... undocumented.
Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)

Thanks.

Look here...

https://app.box.com/s/23yqum8auvzx17h04u4f

..for *ParseWebPages.zip*, which contains:

ParseWebPages.xls
NSN_5960.txt
(Blank data file with fieldnames only in line1
NSN_5960_Test.txt
(Results for 1st 20 pages)

Holy S*!
I did about 30 pages by hand; quit as rather tiresome and total pages
unknown (MORE than 999).
Never saw the fail you saw.

Difference is that you used the word "and"; technically (i think)
that should not affect results.
Also, you got items I am interested in, and after processing 503
pages, i did NOT get those.

In both cases, there were a lot of duplicate records (government
data, what else can you expect?).

In your sample, there were 73 useful records containing 43 unique
records. There may be some that i am not interested in, but there
definitely ARE those i did not find that i am interested in.

In my sample, there were 3782 unique records,and (better sit down),
only 15 were interesting. Crappy odds.

Hopefully, when i call them, someone that has some experience and
knowledge of how their sort criteria works, will answer the phone.
Last time i called,i got a new guy; no help other than "use Google".

You have done a masterful job!
Label it !DONE! please.

Thanks a lot.

GS[_6_]

I've uploaded a new version that skips dupes, and flags missing item
info. (See the new 'Test' file)

This version also runs orders of magnitude faster!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Robert Baer

GS wrote:
GS wrote:
Excel macros are SO... undocumented.
Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)

Thanks.

Look here...

https://app.box.com/s/23yqum8auvzx17h04u4f

..for *ParseWebPages.zip*, which contains:

ParseWebPages.xls
NSN_5960.txt
(Blank data file with fieldnames only in line1
NSN_5960_Test.txt
(Results for 1st 20 pages)

I did not even try cURL as the explanation was just too dern complicated.
Fiddled in Excel,as it has so many different ways to do something
specific.

So, this is skeleton of what i have:
Workbooks.Open Filename:=openFYL$ 'opens as R/O, no HD space taken

then..
With Worksheets(1)
' .Copy ''do not need; saves BOOK space
.SaveAs sav$ 'do not know how to close when done
' above creates the file described; that takes HD space, about 300K
End With

IMMEDIATELY after the "End With", a folder is created with useless
metadata info; do not know how to close when done.

WARNING: Scheme works only in XP and Win7.
If in XP, at about 150 files,one gets a PHONY "HD is full" warning and
one must exit Excel so as to be able to delete processed (and so
unwanted) files.
I say PHONY because the system showed NO CHANGE in HD free space,
never mind those files take about 500MB.

Furthermore, in Win7, these files show up in a folder the system KNOWS
NOTHING ABOUT..Windows Explorer does not show C:\Documents which IS
accessible; C:\<sysname\MY Documents is shown and CANNOT be accessed.
Instead of the Excel program crashing, the system is shut down and
locked.
YET other reasons I hate Win7.

I don't follow what you're talking about here! What does it have to do
with the download I linked to?

In the meantime, i took a stab of a "pure" Excel program to get the data.

Whatever you do and more eXplicity how you do the search, it yields
results that i do not see.

Manually downloading the first page for a manual search, I get:

5960 REGULATOR AND "ELECTRON TUBE"
About 922 results (1 ms)
5960-00-503-9529
5960-00-504-8401
5960-01-035-3901
5960-01-029-2766
5960-00-617-4105
5960-00-729-5602
5960-00-826-1280
5960-00-754-5316
5960-00-962-5391
5960-00-944-4671
5960-00-897-8418
and
5960 AND REGULATOR AND "ELECTRON TUBE"
About 104 results (16 ms)
5960-00-503-9529
5960-00-504-8401
5960-01-035-3901
5960-01-029-2766
5960-00-617-4105
5960-00-729-5602
5960-00-826-1280
5960-00-754-5316
5960-00-962-5391
5960-00-944-4671
5960-00-897-8418

Note they are very different, and the second search "gets" a a lot less.
Also neither search gets anything you got, and i am interested in how
you did it.

Robert Baer

GS wrote:
If you're referring to the substitute 'page error' text put in place of
missing item info, ..well that might be misleading you. Fact is,
starting with item7 on pg7 there is no item info on any pages I checked
manually in the browser (up to pg100). Perhaps you could rephrase that
to "No Data Available"!?

Machs nicht.
I also looked manually and you are correct.
Why the heck they have NSNs that do not relate to a part is puzzling,
but, hey, it *IS* the government.
Not useful to what i need, but still nice to know.

Robert Baer

GS wrote:
GS wrote:
Excel macros are SO... undocumented.
Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)

Thanks.

Look here...

https://app.box.com/s/23yqum8auvzx17h04u4f

..for *ParseWebPages.zip*, which contains:

ParseWebPages.xls
NSN_5960.txt
(Blank data file with fieldnames only in line1
NSN_5960_Test.txt
(Results for 1st 20 pages)

Holy S*!
I did about 30 pages by hand; quit as rather tiresome and total pages
unknown (MORE than 999).
Never saw the fail you saw.

Difference is that you used the word "and"; technically (i think) that
should not affect results.
Also, you got items I am interested in, and after processing 503
pages, i did NOT get those.

In both cases, there were a lot of duplicate records (government data,
what else can you expect?).

In your sample, there were 73 useful records containing 43 unique
records. There may be some that i am not interested in, but there
definitely ARE those i did not find that i am interested in.

You could 'dump' the file into a worksheet and filter out the dupes
easily enough.
* Yes, i did that; getting those 43 unique records.

In my sample, there were 3782 unique records,and (better sit down),
only 15 were interesting. Crappy odds.

Hopefully, when i call them, someone that has some experience and
knowledge of how their sort criteria works, will answer the phone.
Last time i called,i got a new guy; no help other than "use Google".

Those are dynamic web pages and so are database driven. Surely there's a
repository database for this info somewhere other than NSN?
* Prolly not dynamic as NSNs do not change except possible additions on
rare occasion.
Certainly, a SPECIFIC search (so far) always gives the same results.
Look at a previous response concerning 100% manual search results,
first page only.
5960 AND REGULATOR AND "ELECTRON TUBE" About 104 results
5960 REGULATOR AND "ELECTRON TUBE" About 922 results
Totally different and neither match your results.
And your results look superior.

You have done a masterful job!
Label it !DONE! please.

Thanks a lot.

Happy to be of help; -I found the project rather interesting!

GS[_6_]

Also neither search gets anything you got, and i am interested in
how you did it.

If you study the file I gave you, you'll see how both methods are
working. The worksheet implements all manual parsing so you can study
each part of the process as well as the web page source structure; the
*AutoParse* macro collects the data and writes it to the file.

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Robert Baer

Robert Baer wrote:
GS wrote:
GS wrote:
Excel macros are SO... undocumented.
Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)

Thanks.

Look here...

https://app.box.com/s/23yqum8auvzx17h04u4f

..for *ParseWebPages.zip*, which contains:

ParseWebPages.xls
NSN_5960.txt
(Blank data file with fieldnames only in line1
NSN_5960_Test.txt
(Results for 1st 20 pages)

Holy S*!
I did about 30 pages by hand; quit as rather tiresome and total pages
unknown (MORE than 999).
Never saw the fail you saw.

Difference is that you used the word "and"; technically (i think) that
should not affect results.
Also, you got items I am interested in, and after processing 503
pages, i did NOT get those.

In both cases, there were a lot of duplicate records (government data,
what else can you expect?).

In your sample, there were 73 useful records containing 43 unique
records. There may be some that i am not interested in, but there
definitely ARE those i did not find that i am interested in.

You could 'dump' the file into a worksheet and filter out the dupes
easily enough.
* Yes, i did that; getting those 43 unique records.

In my sample, there were 3782 unique records,and (better sit down),
only 15 were interesting. Crappy odds.

Hopefully, when i call them, someone that has some experience and
knowledge of how their sort criteria works, will answer the phone.
Last time i called,i got a new guy; no help other than "use Google".

Those are dynamic web pages and so are database driven. Surely there's a
repository database for this info somewhere other than NSN?
* Prolly not dynamic as NSNs do not change except possible additions on
rare occasion.
Certainly, a SPECIFIC search (so far) always gives the same results.
Look at a previous response concerning 100% manual search results, first
page only.
5960 AND REGULATOR AND "ELECTRON TUBE" About 104 results
5960 REGULATOR AND "ELECTRON TUBE" About 922 results
Totally different and neither match your results.
And your results look superior.

You have done a masterful job!
Label it !DONE! please.

Thanks a lot.

Happy to be of help; -I found the project rather interesting!

I am getting more confused. Search term used and response:
5960 AND REGULATOR AND "ELECTRON TUBE" About 104 results
5960 REGULATOR AND "ELECTRON TUBE" About 922 results
5960 regulator and "ELECTRON TUBE" About 3134377 results

Use of the second two give exactly the same list for the first page
and the last term is the one you used in your program.
The results like i previously said, are completely different WRT the
first term and your program (3 different results).
Notice the 3.1 million results when lower case is used for
"regulator"; i think the database "engine" is thrashing around in what
is almost a useless attempt.
BUT. That thrashing produces very useful results (after sort and
consolidate).

SO.
(1) results are dependent on the form / format of the search term used.
(2) results depend on the (in this case) Excel procedure used that does
the access and fetch.

Now i know rather well, that Excel mangles the HTML information when
it is imported; most especially the primary page.
I had my Excel program working to parse the primary HTML page AS SEEN
BY THE HUMAN EYE ON THE WEB, and i had to make a number of changes to
accommodate what Excel gave me.
Therefore, on that basis, i have a rather strong suspicion that what
Excel SENDS to the web for a search is quite different than what we think.

Comments?
Suggestions, primarily to get it more efficient BUT ALSO give it all
to us?

PS: How i get the data:
Workbooks.Open Filename:=openFYL$ 'YES..opens as R/O
With Worksheets(1)
' .Copy ''do not need; saves BOOKn space
.SaveAs sav$ 'do not know how to close when do not need
End With

GS[_6_]

The process result after copy/paste a web page into a worksheet is
*entirely different* than reading the webpage source. Both my examples
read webpage source *not the rendered page you see in the browser*! The
fault of using copy/paste on a webpage is that different browsers
*often* won't necessarily display content the same way.

If you read the source in the tmp.txt you 'should' very quickly realize
these pages are a template wherein data is dynamically inserted from a
database via script embeded in the source html.

I use the last URL query *you provided* in both the worksheet approach
and the AutoParse() sub. The tmp.txt file shows the complete webpage
source, whereas txtPgSrc shows the webpage source *as rendered* in
WebBrowser1. WebBrowser1 will display whatever is in the URL cell above
it; AutoParse uses the string defined as Public Const gsUrl1$.

You need to decide what URL string you want to run with and set both
the URL cell and gsUrl1 strings to that. Scrap using the copy/paste
webpage approach altogether because it's unreliable at best and renders
inconsistent results at worst! (*Clue:* Note how WebBrowser1 wraps
content but xtPgSrc does not!)

You are collecting data here, NOT capturing webpage content as
rendered. The data displays according to the source behind the rendered
webpage. That source is structured to be dynamic in terms of what data
is rendered based on the URL string, and HOW it displays depends on the
browser being used to view the data. In this case, WebBrowser1 uses the
same engine as Internet Explorer, and what you see on your screen
*depends on* which version of that engine is running!

If you've ever used HTML to build webpages you'd know (or at the very
least *should know*) instinctively that the code source is the only
reliable element to work with.

HTH

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

Robert Baer

GS wrote:
The process result after copy/paste a web page into a worksheet is
*entirely different* than reading the webpage source. Both my examples
read webpage source *not the rendered page you see in the browser*! The
fault of using copy/paste on a webpage is that different browsers
*often* won't necessarily display content the same way.

If you read the source in the tmp.txt you 'should' very quickly realize
these pages are a template wherein data is dynamically inserted from a
database via script embeded in the source html.

I use the last URL query *you provided* in both the worksheet approach
and the AutoParse() sub. The tmp.txt file shows the complete webpage
source, whereas txtPgSrc shows the webpage source *as rendered* in
WebBrowser1. WebBrowser1 will display whatever is in the URL cell above
it; AutoParse uses the string defined as Public Const gsUrl1$.

You need to decide what URL string you want to run with and set both the
URL cell and gsUrl1 strings to that. Scrap using the copy/paste webpage
approach altogether because it's unreliable at best and renders
inconsistent results at worst! (*Clue:* Note how WebBrowser1 wraps
content but xtPgSrc does not!)

You are collecting data here, NOT capturing webpage content as rendered.
The data displays according to the source behind the rendered webpage.
That source is structured to be dynamic in terms of what data is
rendered based on the URL string, and HOW it displays depends on the
browser being used to view the data. In this case, WebBrowser1 uses the
same engine as Internet Explorer, and what you see on your screen
*depends on* which version of that engine is running!

If you've ever used HTML to build webpages you'd know (or at the very
least *should know*) instinctively that the code source is the only
reliable element to work with.

HTH

Maybe i was not too clear.
Case one:
Using a browser, log to https://www.nsncenter.com/ and give it a
search term: 5960&REGULATOR&"ELECTRON TUBE" in the NSN box, and click on
the WebFLIS Search green button.
Then use the browser "File" pulldown, select Save Page As and modify
the extension to .TXT
The resulting file is a bit different than what one sees in other
methods.
Case two:
Choose a method of getting the search results; a given search term
will always produce the same results (ie: reproducible), and small
changes of the search term may give different results - and THOSE
DIFFERENCES are some of what i am talking about.
Case three:
Choose a given search term, and compare results between various
methods; DIFFERENCES may be huge, also some of what i am talking about.

In case three, with your program, whatever is happening gives a
radically different result. And that result is VERY useful.

For some unknown reason, your program/macro refuses to run, and gives
the following error message: "Can't find project or library".

Would you be so kind as to modify the search term in your program to
5960&REGULATOR&"ELECTRON TUBE" and run it? and please send the results?

Thread Tools	Search this Thread
Show Printable Version	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
EOF Parse Text file	Bam	Excel Programming	2	September 24th 08 04:13 AM
Parse a txt file and save as csv?	Frank Pytel	Excel Programming	4	September 14th 08 09:23 PM
parse from txt file	geebee	Excel Programming	3	August 19th 08 07:55 PM
Parse File Location	Mike	Excel Worksheet Functions	5	October 3rd 07 04:05 PM
REQ: Simplest way to parse (read) HTML formatted data in via Excel VBA (or VB6)	Steve[_29_]	Excel Programming	3	August 25th 03 10:43 PM

Menu

About Us