Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 538
Default Read (and parse) file on the web

Robert Baer wrote:

Excel macros are SO... undocumented.


Sure they are. Plenty of documentation for individual keywords. You're not
looking for that, you're looking for a broader concept than individual
keywords.

Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)


Downloading a web page (or any URI, really) is easy:

Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Sub foo()
Dim tmp, result, contents
tmp = Environ("TEMP") & Format$(Now, "yyyymmdd-hhmmss-") & "gTX.htm"
'download
result = URLDownloadToFile(0, "http://www.oil4lessllc.org/gTX.htm", _
tmp, 0, 0)
If result < 0 Then
'failed to download; error handler here
Else
'read from file
Open tmp For Binary As 1
contents = Space$(LOF(1))
Get #1, 1, contents
Close 1
'parse file here
'[...]
'cleanup
End If
Kill tmp
End Sub

(Note that URLDownloadToFile must be declared PtrSafe on 64-bit versions of
Excel.)

Dealing with the data downloaded is very dependant on what the page contains
and what you want to extract from it. That page you mentioned contains 2
images and 2 Javascript arrays; assuming you want the data from the arrays,
you could search for "awls[" or "aprd[" and get your data that way.

Rather than downloading to a file, it is possible to download straight to
memory, but I find it's simpler to use a temp file. Among other things,
downloading to memory requires opening and closing the connection yourself;
URLDownloadToFile handles that for you.

--
- We lose the ones we love; we cannot change it. Put it aside.
- How? How can I do what is needed, when all I feel is... hate?
  #2   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 538
Default Read (and parse) file on the web

I wrote:

tmp = Environ("TEMP") & Format$(Now, "yyyymmdd-hhmmss-") & "gTX.htm"


Damn, missed a backslash.

tmp = Environ("TEMP") & "\" & Format$(Now, "yyyymmdd-hhmmss-") & "gTX.htm"

--
I'm willing to table this discussion for now, and sneak out, if you are.
  #3   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

Excel macros are SO... undocumented.
Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)

Thanks.
  #4   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

Auric__ wrote:
Robert Baer wrote:

Excel macros are SO... undocumented.


Sure they are. Plenty of documentation for individual keywords. You're not
looking for that, you're looking for a broader concept than individual
keywords.

Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)


Downloading a web page (or any URI, really) is easy:

Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Sub foo()
Dim tmp, result, contents
tmp = Environ("TEMP")& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm"
'download
result = URLDownloadToFile(0, "http://www.oil4lessllc.org/gTX.htm", _
tmp, 0, 0)
If result< 0 Then
'failed to download; error handler here
Else
'read from file
Open tmp For Binary As 1
contents = Space$(LOF(1))
Get #1, 1, contents
Close 1
'parse file here
'[...]
'cleanup
End If
Kill tmp
End Sub

(Note that URLDownloadToFile must be declared PtrSafe on 64-bit versions of
Excel.)

Dealing with the data downloaded is very dependant on what the page contains
and what you want to extract from it. That page you mentioned contains 2
images and 2 Javascript arrays; assuming you want the data from the arrays,
you could search for "awls[" or "aprd[" and get your data that way.

Rather than downloading to a file, it is possible to download straight to
memory, but I find it's simpler to use a temp file. Among other things,
downloading to memory requires opening and closing the connection yourself;
URLDownloadToFile handles that for you.

Thanks.
I know that the parsing is very dependent on the contents and what
one wants.
That is why i gave an example having at least one data array; similar
to what i may be parsing.
I too, like temp files because i can open them in random and/or
binary mode if need.
A bit of fun to read such backwards (like BMP files).

Thanks again.

  #5   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

Auric__ wrote:
I wrote:

tmp = Environ("TEMP")& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm"


Damn, missed a backslash.

tmp = Environ("TEMP")& "\"& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm"

Thanks for the correction to the "path" of destruction.



  #6   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

Auric__ wrote:
Robert Baer wrote:

Excel macros are SO... undocumented.


Sure they are. Plenty of documentation for individual keywords. You're not
looking for that, you're looking for a broader concept than individual
keywords.

Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)


Downloading a web page (or any URI, really) is easy:

Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Sub foo()
Dim tmp, result, contents
tmp = Environ("TEMP")& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm"
'download
result = URLDownloadToFile(0, "http://www.oil4lessllc.org/gTX.htm", _
tmp, 0, 0)
If result< 0 Then
'failed to download; error handler here
Else
'read from file
Open tmp For Binary As 1
contents = Space$(LOF(1))
Get #1, 1, contents
Close 1
'parse file here
'[...]
'cleanup
End If
Kill tmp
End Sub

(Note that URLDownloadToFile must be declared PtrSafe on 64-bit versions of
Excel.)

Dealing with the data downloaded is very dependant on what the page contains
and what you want to extract from it. That page you mentioned contains 2
images and 2 Javascript arrays; assuming you want the data from the arrays,
you could search for "awls[" or "aprd[" and get your data that way.

Rather than downloading to a file, it is possible to download straight to
memory, but I find it's simpler to use a temp file. Among other things,
downloading to memory requires opening and closing the connection yourself;
URLDownloadToFile handles that for you.

Well, i am in a pickle.
Firstly, i did a bit of experimenting, and i discovers a few things.
1) The variable "tmp" is nice, but does not have to have the date and
time; that would fill the HD since i have thousands of files to process.
Fix is easy - just have a constant name for use.
2) The file "tmp" is created ONLY if the value of "result" is zero.
3) The problem i have seems to be due to the fact that the online files
have no filetype:
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5"
And i need to process page numbers from 5 to 999 (the reason for the
program).
I have processed page numbers from 1 to 4 by hand...PITA.

Ideas for a fix?
Thanks.
  #7   Report Post  
Banned
 
Posts: 2
Default

Phim SEX không che , Phim SEX Nh*t Bản , Phim SEX loạn luân , Phim SEX HD

Phim SEX HD <<<< Tuyển t*p phim sex chất lượng cao mới nhất, những bộ phim heo hay nhất 2016, xem phim sex HD full 1080p trên web không quảng cáo, chúc các bạn xem ...
  #8   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION

Robert Baer wrote:
Auric__ wrote:
Robert Baer wrote:

Excel macros are SO... undocumented.


Sure they are. Plenty of documentation for individual keywords. You're
not
looking for that, you're looking for a broader concept than individual
keywords.

Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)


Downloading a web page (or any URI, really) is easy:

Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Sub foo()
Dim tmp, result, contents
tmp = Environ("TEMP")& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm"
'download
result = URLDownloadToFile(0, "http://www.oil4lessllc.org/gTX.htm", _
tmp, 0, 0)
If result< 0 Then
'failed to download; error handler here
Else
'read from file
Open tmp For Binary As 1
contents = Space$(LOF(1))
Get #1, 1, contents
Close 1
'parse file here
'[...]
'cleanup
End If
Kill tmp
End Sub

(Note that URLDownloadToFile must be declared PtrSafe on 64-bit
versions of
Excel.)

Dealing with the data downloaded is very dependant on what the page
contains
and what you want to extract from it. That page you mentioned contains 2
images and 2 Javascript arrays; assuming you want the data from the
arrays,
you could search for "awls[" or "aprd[" and get your data that way.

Rather than downloading to a file, it is possible to download straight to
memory, but I find it's simpler to use a temp file. Among other things,
downloading to memory requires opening and closing the connection
yourself;
URLDownloadToFile handles that for you.

Well, i am in a pickle.
Firstly, i did a bit of experimenting, and i discovered a few things.
1) The variable "tmp" is nice, but does not have to have the date and
time; that would fill the HD since i have thousands of files to process.

* Error: if i have umpteen files to process, then i must use their
unique names.

Fix is easy - just have a constant name for use.

* Error: i can issue a KILL each time (a "KILL *.*" does not work).

2) The file "tmp" is created ONLY if the value of "result" is zero.
3) The problem i have seems to be due to the fact that the online files
have no filetype:
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5"
And i need to process page numbers from 5 to 999 (the reason for the
program).
I have processed page numbers from 1 to 4 by hand...PITA.

Ideas for a fix?
Thanks.


  #9   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

Robert Baer wrote:
Robert Baer wrote:
Auric__ wrote:
Robert Baer wrote:

Excel macros are SO... undocumented.

Sure they are. Plenty of documentation for individual keywords. You're
not
looking for that, you're looking for a broader concept than individual
keywords.

Need a WORKING example for reading the HTML source a URL (say
http://www.oil4lessllc.org/gTX.htm)

Downloading a web page (or any URI, really) is easy:

Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Sub foo()
Dim tmp, result, contents
tmp = Environ("TEMP")& Format$(Now, "yyyymmdd-hhmmss-")& "gTX.htm"
'download
result = URLDownloadToFile(0, "http://www.oil4lessllc.org/gTX.htm", _
tmp, 0, 0)
If result< 0 Then
'failed to download; error handler here
Else
'read from file
Open tmp For Binary As 1
contents = Space$(LOF(1))
Get #1, 1, contents
Close 1
'parse file here
'[...]
'cleanup
End If
Kill tmp
End Sub

(Note that URLDownloadToFile must be declared PtrSafe on 64-bit
versions of
Excel.)

Dealing with the data downloaded is very dependant on what the page
contains
and what you want to extract from it. That page you mentioned contains 2
images and 2 Javascript arrays; assuming you want the data from the
arrays,
you could search for "awls[" or "aprd[" and get your data that way.

Rather than downloading to a file, it is possible to download
straight to
memory, but I find it's simpler to use a temp file. Among other things,
downloading to memory requires opening and closing the connection
yourself;
URLDownloadToFile handles that for you.

Well, i am in a pickle.
Firstly, i did a bit of experimenting, and i discovered a few things.
1) The variable "tmp" is nice, but does not have to have the date and
time; that would fill the HD since i have thousands of files to process.

* Error: if i have umpteen files to process, then i must use their
unique names.

Fix is easy - just have a constant name for use.

* Error: i can issue a KILL each time (a "KILL *.*" does not work).

2) The file "tmp" is created ONLY if the value of "result" is zero.
3) The problem i have seems to be due to the fact that the online files
have no filetype:
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5"
And i need to process page numbers from 5 to 999 (the reason for the
program).
I have processed page numbers from 1 to 4 by hand...PITA.

Ideas for a fix?
Thanks.


Well, it is rather bad..
I created a file with no filetype and put it on the web.
The process URLDownloadToFile worked, with "tmp" and "result" being
correct.
BUT....
The line "Open tmp For Binary As 1" barfed with error message "Bad
filename or number" (remember, it worked with my gTX.htm file).

So, there is "something" about that site that seems to prevent read
access.
Is there a way to get a clue (and maybe fix it)?

And assuming a fix, what can i do about the OPEN command/syntax?
// What i did in Excel:
S$ = "D:\Website\Send .Hot\****"
tmp = Environ("TEMP") & "\" & S$
result = URLDownloadToFile(0, S$, tmp, 0, 0)
'The file is at "http://www.oil4lessllc.com/****" ; an Excel file
  #10   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web

Well, i am in a pickle.
Firstly, i did a bit of experimenting, and i discovers a few
things.
1) The variable "tmp" is nice, but does not have to have the date and
time; that would fill the HD since i have thousands of files to
process.


So you did not 'catch' that the tmp file is deleted when this line...

Kill tmp

...gets executed!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



  #11   Report Post  
Banned
 
Posts: 2
Default

Phim SEX không che , Phim SEX Nh*t Bản , Phim SEX loạn luân , Phim SEX HD

Phim SEX online <<<< Xem phim sex chọn lọc chất lượng cao mới nhất 2016, Phim sex online siêu dâm dục, phim sex hd siêu nét vừa xem vừa thủ dâm cực khoái.
  #12   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 538
Default Read (and parse) file on the web CORRECTION#2

Robert Baer wrote:

And assuming a fix, what can i do about the OPEN command/syntax?
// What i did in Excel:
S$ = "D:\Website\Send .Hot\****"
tmp = Environ("TEMP") & "\" & S$


The contents of the variable S$ at this point:

S$ = "C:\Users\auric\D:\Website\Send .Hot\****"

Do you see the problem?

Also, as Garry pointed out, cleanup should happen automatically. The "Kill"
keyword deletes files.

Try this code:

Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Function downloadFile(what As String) As String
'returns tmp file's path on success or empty string on failure
Dim tmp, result, contents, fnum
tmp = Environ("TEMP") & "\" & Format$(Now, "yyyymmdd-hhmmss-") & _
"downloaded.tmp"
'download
result = URLDownloadToFile(0, what, tmp, 0, 0)
If result < 0 Then
'failed to download; error handler here, if any
On Error Resume Next 'can be avoided by checking if the file exists
Kill tmp 'cleanup
On Error GoTo 0
downloadFile = ""
Else
'read from file
fnum = FreeFile
Open tmp For Binary As fnum
contents = Space$(LOF(fnum))
Get #fnum, 1, contents
Close fnum
downloadFile = tmp
End If
End Function

Sub foo()
Dim what As String, files_to_get As Variant, L0
'specify files to download
files_to_get = Array("http://www.oil4lessllc.com/****", _
"http://www.oil4lessllc.org/gTX.htm")
For L0 = LBound(files_to_get) To UBound(files_to_get)
what = downloadFile(files_to_get(L0))
If Len(what) Then
'****parse file here*****
Kill what 'cleanup
End If
Next
End Sub

--
THIS IS NOT AN APPROPRIATE ANSWER.
  #13   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

Auric__ wrote:
Robert Baer wrote:

And assuming a fix, what can i do about the OPEN command/syntax?
// What i did in Excel:
S$ = "D:\Website\Send .Hot\****"
tmp = Environ("TEMP")& "\"& S$


The contents of the variable S$ at this point:

S$ = "C:\Users\auric\D:\Website\Send .Hot\****"

Do you see the problem?

Also, as Garry pointed out, cleanup should happen automatically. The "Kill"
keyword deletes files.

Try this code:

Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Function downloadFile(what As String) As String
'returns tmp file's path on success or empty string on failure
Dim tmp, result, contents, fnum
tmp = Environ("TEMP")& "\"& Format$(Now, "yyyymmdd-hhmmss-")& _
"downloaded.tmp"
'download
result = URLDownloadToFile(0, what, tmp, 0, 0)
If result< 0 Then
'failed to download; error handler here, if any
On Error Resume Next 'can be avoided by checking if the file exists
Kill tmp 'cleanup
On Error GoTo 0
downloadFile = ""
Else
'read from file
fnum = FreeFile
Open tmp For Binary As fnum
contents = Space$(LOF(fnum))
Get #fnum, 1, contents
Close fnum
downloadFile = tmp
End If
End Function

Sub foo()
Dim what As String, files_to_get As Variant, L0
'specify files to download
files_to_get = Array("http://www.oil4lessllc.com/****", _
"http://www.oil4lessllc.org/gTX.htm")
For L0 = LBound(files_to_get) To UBound(files_to_get)
what = downloadFile(files_to_get(L0))
If Len(what) Then
'****parse file here*****
Kill what 'cleanup
End If
Next
End Sub

Grumble..do not understand well enough to get working.
Now i do not know what i had that fully worked with the gTX.htm file.

The following "almost" works; it fails on the open.

Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Sub foo()
Dim tmp, result, content$
SRC1$ =
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5"
SRC3$ = "https://www.nsncenter.com/NSNSearch?q=5960"
SRC2$ = "https://www.nsncenter.com/NSN/5960" 'example only
SRCA$ = "https://www.nsncenter.com/"
SRCB$ = "NSNSearch?q=5960%20regulator&PageNumber=5"
FSC$ = "/NSN/5960"

'Kill "C:\DOCUME~1\ROBERT~1\LOCALS~1\Temp\SRC1$"
S$ = "D:\Website\Send .Hot\****" ' "result" passes
'S$ = "http://www.oil4lessllc.com/REACH.PDF" ' "result" passes
'S$ = "http://www.oil4lessllc.org/gTX.htm" ' "result" passes

tmp = Environ("TEMP") & "\" & S$
' The file "tmp" exists ONLY if the value of "result" is zero
'download
result = URLDownloadToFile(0, S$, tmp, 0, 0)

If result < 0 Then
'failed to download; error handler here
Stop
Else
'read from file
Open tmp For Binary As 1 ' Bad file name or number
content$ = Space$(LOF(1))
lim = LOF(1)
Get #1, 1, content$
Close 1
'parse file here
For pick = 1 To lim Step 50
L1$ = Mid$(content$, pick, 50)
x = x
Next pick
'[...]
'cleanup
End If
Kill tmp
End Sub




  #14   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

GS wrote:
Well, i am in a pickle.
Firstly, i did a bit of experimenting, and i discovers a few things.
1) The variable "tmp" is nice, but does not have to have the date and
time; that would fill the HD since i have thousands of files to process.


So you did not 'catch' that the tmp file is deleted when this line...

Kill tmp

..gets executed!

I know about that; i have been stepping thru the execution (F8), and
stop long before that.
Then i go to CMD prompt and check files and DEL *.* when appropriate
for next test.

  #15   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web

but does not have to have the date and time; that would fill the HD
since i have thousands of files to process.


Filenames have nothing to do with storage space; -it's the file size!
Given Auric_'s suggestion creates text files, the size of 999 txt files
would hardly be more the 1MB total! If you append each page to the 1st
file then all pages could be in 1 file...

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



  #16   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web

After looking at your link to p5, I see what you mean by the amount of
storage space, but filename is not a factor. Parsing will certainly
downsize each file considerably...

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

  #17   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 538
Default Read (and parse) file on the web CORRECTION#2

Robert Baer wrote:

Auric__ wrote:
Robert Baer wrote:

And assuming a fix, what can i do about the OPEN command/syntax?
// What i did in Excel:
S$ = "D:\Website\Send .Hot\****"
tmp = Environ("TEMP")& "\"& S$


The contents of the variable S$ at this point:

S$ = "C:\Users\auric\D:\Website\Send .Hot\****"

Do you see the problem?

Also, as Garry pointed out, cleanup should happen automatically. The

"Kill"
keyword deletes files.

Try this code:


[snip]

Grumble..do not understand well enough to get working.
Now i do not know what i had that fully worked with the gTX.htm file.

The following "almost" works; it fails on the open.


You know, one of us is confused, and I'm not entirely sure it isn't me. I've
given you (theoretically) working code twice now, and yet you insist on
making some pretty radical changes that DON'T ****ING WORK!

So, let's step back from the coding for a moment, and let's have you explain
***EXACTLY*** what it is you want done. Give examples like, "given data X, I
want to do Y, with result Z."

Unless I get a clearer explanation of what you're trying to do, I'm done
with this thread.

--
She died and we did nothing.
  #18   Report Post  
Banned
 
Posts: 1
Default

Ch*o bạn!

Có bạn phải đang tìm kiếm cho mình mã giảm giá adayroi không? Hãy truy c*p ngay MaGiamGiaAdayroi.com để nh*n th*t nhiều mã giảm giá miễn ph* nhé
  #19   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web

I have also found the following:
1) one does not have to use FreeFile for a file number (when all else
is OK).


True, however not considered 'best practice'. Freefile() ensures a
unique ID is assigned to your var.

2) cannot use "contents" for string storage space.

Why not? It's not a VB[A] keyword and so qualifies for use as a var.


3) one cannot mix use of "/" and "\" in a string for a given file
name.


Not sure why you'd use "/" in a path string! Forward slash is not a
legal filename/path character. Backslash is the default Windows path
delimiter. If choosing folders, the last backslah is not followed by a
filename.

4) one cannot have a space in the file name, so that gives a serious
problem for some web URLs (work-around anyone?)


Web paths 'pad' spaces so the string is contiguous. I believe the pad
string is "%20" OR "+".

5) method fails for "https:" (work-around anyone?)


What method?

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

  #20   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

GS wrote:
but does not have to have the date and time; that would fill the HD
since i have thousands of files to process.


Filenames have nothing to do with storage space; -it's the file size!
Given Auric_'s suggestion creates text files, the size of 999 txt files
would hardly be more the 1MB total! If you append each page to the 1st
file then all pages could be in 1 file...

True, BUT the files can be large:
"result = URLDownloadToFile(0, S$, tmp, 0, 0)" creates a file in TEMP
the size of the source - which can be multi-megabtes; 999 of them can
eat the HD space fast.
Hopefully a URL file size does not exceed the space limit allowed in
Excel 2003 string space (anyone know what that might be?).

I have found that the stringvalue AKA TEMP filename can be fixed to
anything reasonable, and does not have to include
parts/substrings/subsets of the file one wants to download.

I can be "a good thing" (quoting Martha Stewart) to delete the file
when done.

I have also found the following:
1) one does not have to use FreeFile for a file number (when all else is
OK).
2) cannot use "contents" for string storage space.
3) one cannot mix use of "/" and "\" in a string for a given file name.
4) one cannot have a space in the file name, so that gives a serious
problem for some web URLs (work-around anyone?)
5) method fails for "https:" (work-around anyone?)


  #21   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web

GS wrote:
After looking at your link to p5, I see what you mean by the amount
of
storage space, but filename is not a factor. Parsing will certainly
downsize each file considerably...

One has to open the file to parse, so that is not logical.
The function URLDownloadToFile gives zero options - it copies ALL
of the source into a TEMP file; one hopes that the source is not
equal to or larger than 4GB in size!

For pages on the web, that is extremely unlikely; webpage size max
limit prolly is 10MB; maybe 300K worst case on the average.

So, once in TEMP, it can be opened for input (text), for random
(may specify buffer size), or for binary (may specify buffer size).
Here,one can optimize read speed VS string space used.


Your last statement contradicts your first statement. Sounds like you
need to do some *extensive* research into standard VB file I/O
procedures and general parsing techniques!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

  #22   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web

5) method fails for "https:" (work-around anyone?)

These URLs usually require some kind of 'login' be done, which needs to
be included in the URL string.

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

  #23   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

GS wrote:
After looking at your link to p5, I see what you mean by the amount of
storage space, but filename is not a factor. Parsing will certainly
downsize each file considerably...

One has to open the file to parse, so that is not logical.
The function URLDownloadToFile gives zero options - it copies ALL of
the source into a TEMP file; one hopes that the source is not equal to
or larger than 4GB in size!

For pages on the web, that is extremely unlikely; webpage size max
limit prolly is 10MB; maybe 300K worst case on the average.

So, once in TEMP, it can be opened for input (text), for random (may
specify buffer size), or for binary (may specify buffer size).
Here,one can optimize read speed VS string space used.


  #24   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

GS wrote:
I have also found the following:
1) one does not have to use FreeFile for a file number (when all else
is OK).


True, however not considered 'best practice'. Freefile() ensures a
unique ID is assigned to your var.

2) cannot use "contents" for string storage space.

Why not? It's not a VB[A] keyword and so qualifies for use as a var.

* Get run-time error 458, "variable uses an Automation type not
supported in Visual Basic".



3) one cannot mix use of "/" and "\" in a string for a given file name.


Not sure why you'd use "/" in a path string! Forward slash is not a
legal filename/path character. Backslash is the default Windows path
delimiter. If choosing folders, the last backslah is not followed by a
filename.

I note that in Windoze, that "\" is used,and on the web, "/" is used.


4) one cannot have a space in the file name, so that gives a serious
problem for some web URLs (work-around anyone?)


Web paths 'pad' spaces so the string is contiguous. I believe the pad
string is "%20" OR "+".

Yes; "%20" is used and seems to act like a space and seems to kill
the URLDownloadToFile function usefulness.


5) method fails for "https:" (work-around anyone?)


What method?

* the function URLDownloadToFile. Is "method" the wrong term?




  #25   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

GS wrote:
GS wrote:
After looking at your link to p5, I see what you mean by the amount of
storage space, but filename is not a factor. Parsing will certainly
downsize each file considerably...

One has to open the file to parse, so that is not logical.
The function URLDownloadToFile gives zero options - it copies ALL of
the source into a TEMP file; one hopes that the source is not equal to
or larger than 4GB in size!

For pages on the web, that is extremely unlikely; webpage size max
limit prolly is 10MB; maybe 300K worst case on the average.

So, once in TEMP, it can be opened for input (text), for random (may
specify buffer size), or for binary (may specify buffer size).
Here,one can optimize read speed VS string space used.


Your last statement contradicts your first statement. Sounds like you
need to do some *extensive* research into standard VB file I/O
procedures and general parsing techniques!

Perhaps you have some things confused.
In that program, "tmp" is a string used for the name of a (hopefully)
to-be created file.

THAT file can be large,as it MUST "hold" the contents of the URL
being transferred.

Like i said, the function URLDownloadToFile gives zero options - it
copies ALL of the source into a TEMP file (named via "tmp").

Then and only then one can do I/O. I am an expert on on file I/O and
parsing; read backwards, read (and parse) TEXT files using the file in
binary or random mode; "inserting" and/or "clipping" stuff into/out of
the middle of a file, etc. When pressed, i could even write a file
backwards, but i have yet to see any reason to try that.





  #26   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

GS wrote:
5) method fails for "https:" (work-around anyone?)


These URLs usually require some kind of 'login' be done, which needs to
be included in the URL string.

NO login required; try it.

  #27   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web

I note that in Windoze, that "\" is used,and on the web, "/" is
used.


For clarity.., a Windows file path is not the same as a URL.

Windows file paths allow spaces; URLs do not.
Windows path delimiter is "\"; Web path delimiter is "/".

The function URLDownloadToFile() downloads szURL to szFilename.

Once downloaded, szFilename needs to be opened, parsed, and result
stored locally. Then the next file needs to be downloaded, parsed, and
stored. And so on until all files have been downloaded and parsed.

Since the actual page contents comprise only a small portion of the
files being downloaded, there size should be considerably smaller after
parsing. If you extract the data (text) only (no images) and save this
to a txt file you should be able to 'append' to a single file which
would result in occupying far less disc space. (For example, pg5 is
less than 1kb) I suspect, though, that you need the image to identify
the item source (Raytheon, RCA, Lucent Tech, NAWC, MIL STD, etc)
because this info is not stored in the image file metadata. Otherwise,
the txt file after parsing pg5's text is the following 53 lines:

NSN 5960-00-509-3171
5960-00-509-3171

ELECTRON TUBE

NSN 5960-00-569-9531
5960-00-569-9531

ELECTRON TUBE

NSN 5960-00-553-3770
5960-00-553-3770

ELECTRON TUBE

NSN 5960-00-682-8624
5960-00-682-8624

ELECTRON TUBE

NSN 5960-00-808-6928
5960-00-808-6928

ELECTRON TUBE

NSN 5960-00-766-1953
5960-00-766-1953

ELECTRON TUBE

NSN 5960-00-850-6169
5960-00-850-6169

ELECTRON TUBE

NSN 5960-00-679-8153
5960-00-679-8153

ELECTRON TUBE

NSN 5960-00-134-6884
5960-00-134-6884

ELECTRON TUBE

NSN 5960-00-061-8610
5960-00-061-8610

ELECTRON TUBE

5960-00-067-9636

ELECTRON TUBE

The file size is 711 bytes, and lists 11 items. Note the last item has
no image and so no filler text (NSN line). This inconsistency makes
parsing the contents difficult since you don't know which items do not
have images.

If you copy/paste pg5 into Excel you get both text and image. You could
then do something to construct the info in a database fashion...

Col Headers:
Source :: PartNum :: Description

...and put the data in the respective columns. This seems very
inefficient but is probably less daunting than what you've been doing
manually thus far. Auto Complete should be helpful with this, and you
could sort the list by Source. Note that clicking the image or part# on
the worksheet takes you to the same page as does clicking it on the web
page. In the case of pg5, the data will occupy 11 rows.

Seems like your approach is the long way; -I'd find a better data
source myself! Perhaps subscribe to an electronics database utility
(such as my CAD software would use) that I can update by downloading a
single db file<g

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

  #28   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

Auric__ wrote:
Robert Baer wrote:

Auric__ wrote:
Robert Baer wrote:

And assuming a fix, what can i do about the OPEN command/syntax?
// What i did in Excel:
S$ = "D:\Website\Send .Hot\****"
tmp = Environ("TEMP")& "\"& S$

The contents of the variable S$ at this point:

S$ = "C:\Users\auric\D:\Website\Send .Hot\****"

Do you see the problem?

Also, as Garry pointed out, cleanup should happen automatically. The

"Kill"
keyword deletes files.

Try this code:


[snip]

Grumble..do not understand well enough to get working.
Now i do not know what i had that fully worked with the gTX.htm file.

The following "almost" works; it fails on the open.


You know, one of us is confused, and I'm not entirely sure it isn't me. I've
given you (theoretically) working code twice now, and yet you insist on
making some pretty radical changes that DON'T ****ING WORK!

So, let's step back from the coding for a moment, and let's have you explain
***EXACTLY*** what it is you want done. Give examples like, "given data X, I
want to do Y, with result Z."

Unless I get a clearer explanation of what you're trying to do, I'm done
with this thread.

I wish to read and parse every page of
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber="
where the page number goes from 5 to 999.
On each page, find "<a href="/NSN/5960 [it is longer, but that is the
start].
Given the full number (eg: <a href="/NSN/5960-00-831-8683"), open a
new related page "https://www.nsncenter.com/NSN/5960-00-831-8683" and
find the line ending "(MCRL)".
Read abut 4 lines to <a href="/PartNumber/ which is <a
href="/PartNumber/GV4S1400" in this case.
save/write that line plus the next three; close this secondary online
URL and step to next "<a href="/NSN/5960 to process the same way.
Continue to end of the page, close that URL and open the next page.

Crude code:
CLOSE ' PRS5960.BAS (QuickBasic)
' watch linewrap below..
SRC1$ = "https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber="
SRC2$ = "https://www.nsncenter.com/NSN/5960" 'example only
FSC$ = "/NSN/5960"
OPEN "FSC5960.TXT" FOR APPEND AS #9
' Let page number run from 05 to 39 to read existing files
FOR PG = 5 TO 39
A$ = ""
FPG$ = RIGHT$("0" + MID$(STR$(PG), 2), 2)
' These files, FPG$ + ".txt" are copies from the web
OPEN FPG$ + ".txt" FOR INPUT AS #1
ON ERROR GOTO END1
PRINT FPG$ + ".txt", 'is screen note to me
WHILE NOT EOF(1)
WHILE INSTR(A$, FSC$) = 0 'skip 7765 lines of junk
LINE INPUT #1, A$ 'look for <a href="/NSN/5960-00-754-5782" Class= ETC
WEND
P = INSTR(A$, FSC$) + 9: FPG2$ = SRC2$ + MID$(A$, P, 12)
NSN$ = "5960" + MID$(A$, P, 12)
PRINT NSN$ 'is screen note to me
AHREF$ = ".." + FSC$ + MID$(A$, P, 12)
'Need URL FPG2$ or .. a href to get balance of data
' See comments above this program
PRINT #9, NSN$
LINE INPUT #1, A$
WEND
END1:
RESUME LAB
LAB:
CLOSE #1
NEXT PG
CLOSE
SYSTEM
**
Note the Function URLDownloadToFile does not allow spaces; there is
one in the "page" URL.
Problem #2, the Function URLDownloadToFile does not allow https
website URLs.
Other than those problems, i have everything else working fine.





  #29   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web CORRECTION#2

So what you also want is the linked file (web page) the image or part#
links to! Here's what I got from
https://www.nsncenter.com/NSN/5960-00-831-8683 (pg4):

1st occurance of <a href="/NSN/5960 is at line 7878;

1st occurance of (MCRL) is at line 7931;

1st occurance after that of <a href="/PartNumber" is this at line 7951;
<td align="center" style="vertical-align: middle;"<a
href="/PartNumber/GV4S1400"GV4S1400</a</td

and the next 3 lines a
<td style="width: 125px; height: 60px; vertical-align: middle;"
align="center" nowrap&nbsp;&nbsp;<a
href="/CAGE/63060"63060</a&nbsp;&nbsp;</td
<td align="center" style="vertical-align: middle;"&nbsp;&nbsp;<a
href="/CAGE/63060"<img class="img-thumbnail"
src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45
width=90 /</a&nbsp;&nbsp;</td
<td text-align="center" style="vertical-align: middle;"<a title="CAGE
63060" href="/CAGE/63060"HEICO OHMITE LLC</a</td


So you want to go to the next page linked to and repeat the process?

At this point my Excel sheet has been modified as follows:

Source | NSN Item# | Description | Part# | MCRL#
Tektronix | 5960-00-831-8683 | ELECTRON TUBE | GV4S1400 | 4932653
<a href="/CAGE/63060"63060</a
<a href="/CAGE/63060"<img class="img-thumbnail"
src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45
width=90 /</a
<a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a

General Dynamics | 5960-00-853-8207 | ELECTRON TUBE | 295-29434 |
5074477
line1
line2
line3

...and so on.

So far, I'm working with text files and so I'm inclined to append each
item to a file named "ElectronTube_NSN5960.txt". File contents for the
2 items above would be structured so the 1st line contains headings
(data fields) so it works properly with ADODB. (Note that I use a comma
as delimiter, and the file does not contain any blank lines)...

Source,NSN Item#,Description,Part#,MCRL#
Tektronix,5960-00-831-8683,ELECTRON TUBE,GV4S1400,4932653
<a href="/CAGE/63060"63060</a
<a href="/CAGE/63060"<img class="img-thumbnail"
src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45
width=90 /</a
<a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a
General Dynamics,5960-00-853-8207,ELECTRON TUBE,295-29434,5074477
<a href="/CAGE/1VPW8"1VPW8</a
<a href="/CAGE/1VPW8"<img class="img-thumbnail"
src="https://az774353.vo.msecnd.net/cage/90/1vpw8.jpg" alt="CAGE 1VPW8"
height=45 width=90 /</a
<a title="CAGE 1VPW8" href="/CAGE/1VPW8"GENERAL DYNAMICS C4 SYSTEMS,
INC.</a

...where I have parsed off the CSS formatting text and html tags outside
<a...</a from the 3 lines. I'd likely convert the UCase to proer case
as well.

The file size is 653 bytes meaning a full page would be about 4kb; 1000
pages being about 4mb. That's 44 lines per page after the fields line.

A file this size can be easily handled via ADO recordset or std VB file
I/O functions/methods. Loading into an array (vData) puts fields in
vData(0) and records starting at vData(1), and looping would Step 4.

I really don't have the time/energy (I have Lou Gehrig's) to get any
more involved with your project due to current commitments. I just felt
it might be worth explaining how I'd handle your task in the hopes it
would be helpful to you reaching a viable solution. I bid you good
wishes going forward...

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

  #30   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

GS wrote:
I note that in Windoze, that "\" is used,and on the web, "/" is used.


For clarity.., a Windows file path is not the same as a URL.

Windows file paths allow spaces; URLs do not.

* Incorrect! See -------------------------vvv
https://www.nsncenter.com/NSNSearch?...r&PageNumber=5

Windows path delimiter is "\"; Web path delimiter is "/".

The function URLDownloadToFile() downloads szURL to szFilename.

* IF and when it works.


Once downloaded, szFilename needs to be opened, parsed, and result
stored locally. Then the next file needs to be downloaded, parsed, and
stored. And so on until all files have been downloaded and parsed.

* This i knew from the git-go; nice to be clarified.


Since the actual page contents comprise only a small portion of the
files being downloaded, there size should be considerably smaller after
parsing. If you extract the data (text) only (no images) and save this
to a txt file you should be able to 'append' to a single file which
would result in occupying far less disc space. (For example, pg5 is less
than 1kb) I suspect, though, that you need the image to identify the
item source (Raytheon, RCA, Lucent Tech, NAWC, MIL STD, etc) because
this info is not stored in the image file metadata. Otherwise, the txt
file after parsing pg5's text is the following 53 lines:

NSN 5960-00-509-3171
5960-00-509-3171

ELECTRON TUBE

NSN 5960-00-569-9531
5960-00-569-9531

ELECTRON TUBE

NSN 5960-00-553-3770
5960-00-553-3770

ELECTRON TUBE

NSN 5960-00-682-8624
5960-00-682-8624

ELECTRON TUBE

NSN 5960-00-808-6928
5960-00-808-6928

ELECTRON TUBE

NSN 5960-00-766-1953
5960-00-766-1953

ELECTRON TUBE

NSN 5960-00-850-6169
5960-00-850-6169

ELECTRON TUBE

NSN 5960-00-679-8153
5960-00-679-8153

ELECTRON TUBE

NSN 5960-00-134-6884
5960-00-134-6884

ELECTRON TUBE

NSN 5960-00-061-8610
5960-00-061-8610

ELECTRON TUBE

5960-00-067-9636

ELECTRON TUBE

The file size is 711 bytes, and lists 11 items. Note the last item has
no image and so no filler text (NSN line). This inconsistency makes
parsing the contents difficult since you don't know which items do not
have images.

* I think you may have pulled the info from what you saw on that page,
and not from the source.
In one of my responses, i gave QBASIC code for parsing, and as i
remember, there were about 7760 lines of junk before one sees <a
href="/NSN/5960; which gives the full NSN code.
Use of that allows one to get the second URL, eg:
https://www.nsncenter.com/NSN/5960-00-754-5782 NO image reliance at all.
There are 11 entries per page,and no inconsistencies with my method
of search in the page.


If you copy/paste pg5 into Excel you get both text and image. You could
then do something to construct the info in a database fashion...

* That would only make things more difficult. A copy to a local file is
sufficient for a simple parsing as described here and elsewhere in this
thread.


Col Headers:
Source :: PartNum :: Description

..and put the data in the respective columns. This seems very
inefficient but is probably less daunting than what you've been doing
manually thus far. Auto Complete should be helpful with this, and you
could sort the list by Source. Note that clicking the image or part# on
the worksheet takes you to the same page as does clicking it on the web
page. In the case of pg5, the data will occupy 11 rows.

* Manual: Right click, select View Page Source, Save as to HD by
changing Filetype from HTM to TXT and changing fiiename to add page
number (013 for example).
Like i said,parsing of that file is simple and easy; getting 35 pages
copied that way did not take long, but there are 999 of them...


Seems like your approach is the long way; -I'd find a better data source
myself! Perhaps subscribe to an electronics database utility (such as my
CAD software would use) that I can update by downloading a single db
file<g

* I have asked, and received zero response.




  #31   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

GS wrote:
So what you also want is the linked file (web page) the image or part#
links to! Here's what I got from
https://www.nsncenter.com/NSN/5960-00-831-8683 (pg4):

1st occurance of <a href="/NSN/5960 is at line 7878;

1st occurance of (MCRL) is at line 7931;

1st occurance after that of <a href="/PartNumber" is this at line 7951;
<td align="center" style="vertical-align: middle;"<a
href="/PartNumber/GV4S1400"GV4S1400</a</td

and the next 3 lines a
<td style="width: 125px; height: 60px; vertical-align: middle;"
align="center" nowrap&nbsp;&nbsp;<a
href="/CAGE/63060"63060</a&nbsp;&nbsp;</td
<td align="center" style="vertical-align: middle;"&nbsp;&nbsp;<a
href="/CAGE/63060"<img class="img-thumbnail"
src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45
width=90 /</a&nbsp;&nbsp;</td
<td text-align="center" style="vertical-align: middle;"<a title="CAGE
63060" href="/CAGE/63060"HEICO OHMITE LLC</a</td


So you want to go to the next page linked to and repeat the process?

At this point my Excel sheet has been modified as follows:

Source | NSN Item# | Description | Part# | MCRL#
Tektronix | 5960-00-831-8683 | ELECTRON TUBE | GV4S1400 | 4932653
<a href="/CAGE/63060"63060</a
<a href="/CAGE/63060"<img class="img-thumbnail"
src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45
width=90 /</a
<a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a

General Dynamics | 5960-00-853-8207 | ELECTRON TUBE | 295-29434 | 5074477
line1
line2
line3

..and so on.

So far, I'm working with text files and so I'm inclined to append each
item to a file named "ElectronTube_NSN5960.txt". File contents for the 2
items above would be structured so the 1st line contains headings (data
fields) so it works properly with ADODB. (Note that I use a comma as
delimiter, and the file does not contain any blank lines)...

Source,NSN Item#,Description,Part#,MCRL#
Tektronix,5960-00-831-8683,ELECTRON TUBE,GV4S1400,4932653
<a href="/CAGE/63060"63060</a
<a href="/CAGE/63060"<img class="img-thumbnail"
src="https://placehold.it/90x45?text=No%0DImage%0DYet" height=45
width=90 /</a
<a title="CAGE 63060" href="/CAGE/63060"HEICO OHMITE LLC</a
General Dynamics,5960-00-853-8207,ELECTRON TUBE,295-29434,5074477
<a href="/CAGE/1VPW8"1VPW8</a
<a href="/CAGE/1VPW8"<img class="img-thumbnail"
src="https://az774353.vo.msecnd.net/cage/90/1vpw8.jpg" alt="CAGE 1VPW8"
height=45 width=90 /</a
<a title="CAGE 1VPW8" href="/CAGE/1VPW8"GENERAL DYNAMICS C4 SYSTEMS,
INC.</a

..where I have parsed off the CSS formatting text and html tags outside
<a...</a from the 3 lines. I'd likely convert the UCase to proer case
as well.

The file size is 653 bytes meaning a full page would be about 4kb; 1000
pages being about 4mb. That's 44 lines per page after the fields line.

A file this size can be easily handled via ADO recordset or std VB file
I/O functions/methods. Loading into an array (vData) puts fields in
vData(0) and records starting at vData(1), and looping would Step 4.

I really don't have the time/energy (I have Lou Gehrig's) to get any
more involved with your project due to current commitments. I just felt
it might be worth explaining how I'd handle your task in the hopes it
would be helpful to you reaching a viable solution. I bid you good
wishes going forward...

* Thanks for the guide.


You are getting all of the right stuff from what i would call the
second file.
The first file is
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=" &
PageNum where PagNum (in ASCII) goes from "1" to "999".
Note the (implied?) space in the URL.

I think that by now you have it all figured out.

In snooping around,i have just stumbled on the ADODB scheme,and what
little i have found so far it looks very promising.
Only one example which does not work (examples NEVER work) and zero
explanations so far.
It seems that with the proper code, that ADODB would allow me to copy
those first files to a HD.

Would you be so kind as to share your working ADODB code?
Or did you hand-copy the source like i did?

Thanks again.




  #32   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web CORRECTION#2

You are getting all of the right stuff from what i would call the
second file.
The first file is
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber="
& PageNum where PagNum (in ASCII) goes from "1" to "999".
Note the (implied?) space in the URL.


I got Source, NSN Part#, Description from the 1st file. The NSN Item#
links to the 2nd file.

<snip
Would you be so kind as to share your working ADODB code?
Or did you hand-copy the source like i did?


I use std VB file I/O not ADODB. Initial procedure was to copy/paste
page source into Textpad and save as Tmp.txt. Then load the file into
an array and parse from there.

I thought I'd take a look at going with a userform and MS Web Browser
control for more flexible programming opts, but haven't had the time. I
assume this would definitely give you an advantage over trying to
automate IE, but I need to research using it. I do have URL functions
built into my fpSpread.ocx for doing this stuff, but that's an
expensive 3rd party AX component. Otherwise, doing this from Excel
isn't something I'm familiar with.

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

  #33   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web CORRECTION#2

I thought I'd take a look at going with a userform and MS Web Browser
control for more flexible programming opts


While I'm on pause waiting to consult with client on my current
project...

This is doable; -I have a userform with web browser, a textbox, and
some buttons.

The Web Browser doesn't display AddressBar/StatusBar for some reason,
even though these props are set 'True'. (Initial URL (pg1) is
hard-coded as a result) You navigate to parent pages using Next/Last
buttons, starting with pg1 on load. Optionally, you can enter a page#
in a GoTo box.

The browser lets you select links, and you load its current document
page source into txtViewSrc via btnViewSrc. This action also Splits()
page source into vPgSrc for locating search text selected in
cboSearchTxt. The cboSearchTxt_Change event auto-locates your desired
lines at present, but I will have it appending them to file shortly.
This file will be structured same as illustrated earlier. I think this
could be fully automated after I see how the links are define in their
innerHTML.

For now, I'll provide a button to write found lines because it gives an
opportunity to preview the data going into your file. This will happen
via loading found lines into vaLinesOut() which is sized 0to3. This
will make the search sequence important so the output file has its
lines in the correct order (top-to-bottom in page source).

I use my own file read/write procedures because they're configured for
large amounts of data in 1 shot to/from dbase.txt files, and so are
included in the userform class.

While there's still a manual element to this, it's going to be orders
of magnitude less daunting and more efficient that what you do now. It
seems highly likely over time that this entire task can be fully
automated just by entering the URL for pg1!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

  #34   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

GS wrote:
You are getting all of the right stuff from what i would call the
second file.
The first file is
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=" &
PageNum where PagNum (in ASCII) goes from "1" to "999".
Note the (implied?) space in the URL.


I got Source, NSN Part#, Description from the 1st file. The NSN Item#
links to the 2nd file.

<snip
Would you be so kind as to share your working ADODB code?
Or did you hand-copy the source like i did?


I use std VB file I/O not ADODB. Initial procedure was to copy/paste
page source into Textpad and save as Tmp.txt. Then load the file into an
array and parse from there.

I thought I'd take a look at going with a userform and MS Web Browser
control for more flexible programming opts, but haven't had the time. I
assume this would definitely give you an advantage over trying to
automate IE, but I need to research using it. I do have URL functions
built into my fpSpread.ocx for doing this stuff, but that's an expensive
3rd party AX component. Otherwise, doing this from Excel isn't something
I'm familiar with.

Check. I know QBASIC fairly well, so a lot of that knowledge crosses
over to VB.
Someone here was kind enough to give me a full working program that
can be used to copy a URL source to a temp file on the HD.
Once available all else is very simple and straight forward.
The rub is that function (or something it uses) does not allow a
space in the URL,AND also does not allow https.
So, i need two work-arounds, and the https part would seem to be the
worst.
I do not know how it works, what DLLs/libraries it calls; no useful
information is available.
It is:
Declare Function URLDownloadToFile Lib "urlmon" _
Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
ByVal szURL As String, ByVal szFileName As String, _
ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long

Only the well-known keywords can be found; 'urlmon', 'pCaller',
'szURL', and 'szFileName' are unknowns and not findable in the so-called
VB help.
And there are no examples; the few ranDUMB ones are incomplete and/or
do not work..

I do not see how you use std VB file I/O; AFAIK one cannot open a web
page as if it was a file.




  #35   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

GS wrote:
I thought I'd take a look at going with a userform and MS Web Browser
control for more flexible programming opts


While I'm on pause waiting to consult with client on my current project...

This is doable; -I have a userform with web browser, a textbox, and some
buttons.

The Web Browser doesn't display AddressBar/StatusBar for some reason,
even though these props are set 'True'. (Initial URL (pg1) is hard-coded
as a result) You navigate to parent pages using Next/Last buttons,
starting with pg1 on load. Optionally, you can enter a page# in a GoTo box.

The browser lets you select links, and you load its current document
page source into txtViewSrc via btnViewSrc. This action also Splits()
page source into vPgSrc for locating search text selected in
cboSearchTxt. The cboSearchTxt_Change event auto-locates your desired
lines at present, but I will have it appending them to file shortly.
This file will be structured same as illustrated earlier. I think this
could be fully automated after I see how the links are define in their
innerHTML.

For now, I'll provide a button to write found lines because it gives an
opportunity to preview the data going into your file. This will happen
via loading found lines into vaLinesOut() which is sized 0to3. This will
make the search sequence important so the output file has its lines in
the correct order (top-to-bottom in page source).

I use my own file read/write procedures because they're configured for
large amounts of data in 1 shot to/from dbase.txt files, and so are
included in the userform class.

While there's still a manual element to this, it's going to be orders of
magnitude less daunting and more efficient that what you do now. It
seems highly likely over time that this entire task can be fully
automated just by entering the URL for pg1!

Way beyond me.
If in HTML one can copy an <a href="..." to the hard drive, then
that is all i need.



  #36   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web CORRECTION#2

GS wrote:
I thought I'd take a look at going with a userform and MS Web
Browser
control for more flexible programming opts


While I'm on pause waiting to consult with client on my current
project...

This is doable; -I have a userform with web browser, a textbox, and
some
buttons.

The Web Browser doesn't display AddressBar/StatusBar for some
reason,
even though these props are set 'True'. (Initial URL (pg1) is
hard-coded
as a result) You navigate to parent pages using Next/Last buttons,
starting with pg1 on load. Optionally, you can enter a page# in a
GoTo box.

The browser lets you select links, and you load its current
document
page source into txtViewSrc via btnViewSrc. This action also
Splits()
page source into vPgSrc for locating search text selected in
cboSearchTxt. The cboSearchTxt_Change event auto-locates your
desired
lines at present, but I will have it appending them to file
shortly.
This file will be structured same as illustrated earlier. I think
this
could be fully automated after I see how the links are define in
their
innerHTML.

For now, I'll provide a button to write found lines because it
gives an
opportunity to preview the data going into your file. This will
happen
via loading found lines into vaLinesOut() which is sized 0to3. This
will
make the search sequence important so the output file has its lines
in
the correct order (top-to-bottom in page source).

I use my own file read/write procedures because they're configured
for
large amounts of data in 1 shot to/from dbase.txt files, and so are
included in the userform class.

While there's still a manual element to this, it's going to be
orders of
magnitude less daunting and more efficient that what you do now. It
seems highly likely over time that this entire task can be fully
automated just by entering the URL for pg1!

Way beyond me.
If in HTML one can copy an <a href="..." to the hard drive, then
that is all i need.


Just another approach, since you seem to be having difficulty getting
URLDownloadToFile() to work.

My approach reads innerHTML of web pages and outputs to txt file. Not
sure why you want to grab html and save to disc given the file size is
a concern. My approach puts parsed data from all 999 pages into a txt
file less than 4mb in size. Once the individual steps have been
optimized, automating the entire process will be easy. (I'll leave that
part for you to do however you want it to work)

I will post the contents of my fParseWebPages.frm file. You will need
to set a ref to the Microsoft Web Browser to use it.

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

  #37   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 1,182
Default Read (and parse) file on the web CORRECTION#2

I felt you were going about this task wrongly. That's why I did my own
approach; -to illustrate an easier way to get the job done. And it's
doing just that!

I'll try to get the working solution to you over this long weekend.

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

  #38   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web CORRECTION#2

GS wrote:
GS wrote:
I thought I'd take a look at going with a userform and MS Web Browser
control for more flexible programming opts

While I'm on pause waiting to consult with client on my current
project...

This is doable; -I have a userform with web browser, a textbox, and some
buttons.

The Web Browser doesn't display AddressBar/StatusBar for some reason,
even though these props are set 'True'. (Initial URL (pg1) is hard-coded
as a result) You navigate to parent pages using Next/Last buttons,
starting with pg1 on load. Optionally, you can enter a page# in a
GoTo box.

The browser lets you select links, and you load its current document
page source into txtViewSrc via btnViewSrc. This action also Splits()
page source into vPgSrc for locating search text selected in
cboSearchTxt. The cboSearchTxt_Change event auto-locates your desired
lines at present, but I will have it appending them to file shortly.
This file will be structured same as illustrated earlier. I think this
could be fully automated after I see how the links are define in their
innerHTML.

For now, I'll provide a button to write found lines because it gives an
opportunity to preview the data going into your file. This will happen
via loading found lines into vaLinesOut() which is sized 0to3. This will
make the search sequence important so the output file has its lines in
the correct order (top-to-bottom in page source).

I use my own file read/write procedures because they're configured for
large amounts of data in 1 shot to/from dbase.txt files, and so are
included in the userform class.

While there's still a manual element to this, it's going to be orders of
magnitude less daunting and more efficient that what you do now. It
seems highly likely over time that this entire task can be fully
automated just by entering the URL for pg1!

Way beyond me.
If in HTML one can copy an <a href="..." to the hard drive, then that
is all i need.


Just another approach, since you seem to be having difficulty getting
URLDownloadToFile() to work.

* snooped around the Excel VB help to find alternate ways (i remember
seeing other ways). Here was the first step:
SRC1$ =
"https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5"

''test..
With Worksheets(1)
.Hyperlinks.Add .Range("E5"), SRC1$
End With
**
That places the URL in cell E5 and it looked good.
Clicking once on it "to follow", i get error message "Unable to
open...Cannot download the information you requested."
This may have the same 2 problems,the (implied) space,and the https.
So, to test those,i added a folder on one of my websites:
http://www.oil4lessllc.com/Try%20.space/
...and that works.
Therefore, this scheme is one better than the function.
Next, i tested the https with: https://duckduckgo.com/ .
I get that error message.
So i am still dead in the water.


My approach reads innerHTML of web pages and outputs to txt file.

* THAT is what i cannot do; AFAIK neither HTML5 nor javascript can
write to a file.

Not sure why you want to grab html and save to disc given the file

size is a
concern.

* Well, when i first started, i got a bit scared because of those 7750
(appx) lines; first time i saw a web site that huge.
Now that i have processed 30 pages, i know better.

My approach puts parsed data from all 999 pages into a txt file
less than 4mb in size. Once the individual steps have been optimized,
automating the entire process will be easy. (I'll leave that part for
you to do however you want it to work)

* So far, in that direction, i have:
<header
<a
href="https://www.nsncenter.com/NSNSearch?q=5960%20regulator&PageNumber=5"
id="NSN5860" hello<BR</a
<script
var T = document.anchors;
document.getElementById("NSN5860").innerHTML = T;
// open an output stream in a new window
document.open("text/html",replace);
document.writeln(T+"|<BR");
// display in new window
document.close();
</script
<!-- above gets
[object HTMLCollection]
on new screen/window, where that/object item is the href pointer
--


I will post the contents of my fParseWebPages.frm file. You will need to
set a ref to the Microsoft Web Browser to use it.


  #39   Report Post  
Posted to microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

GS wrote:
Well, i am in a pickle.
Firstly, i did a bit of experimenting, and i discovers a few things.
1) The variable "tmp" is nice, but does not have to have the date and
time; that would fill the HD since i have thousands of files to process.


So you did not 'catch' that the tmp file is deleted when this line...

Kill tmp

..gets executed!

Been thru this already..also makes no sense to have a name miles long.

  #40   Report Post  
Posted to microsoft.public.excel.programming
external usenet poster
 
Posts: 93
Default Read (and parse) file on the web

What is with this a*hole posting this sh*t here?
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules

Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
EOF Parse Text file Bam Excel Programming 2 September 24th 08 04:13 AM
Parse a txt file and save as csv? Frank Pytel Excel Programming 4 September 14th 08 09:23 PM
parse from txt file geebee Excel Programming 3 August 19th 08 07:55 PM
Parse File Location Mike Excel Worksheet Functions 5 October 3rd 07 04:05 PM
REQ: Simplest way to parse (read) HTML formatted data in via Excel VBA (or VB6) Steve[_29_] Excel Programming 3 August 25th 03 10:43 PM


All times are GMT +1. The time now is 09:10 AM.

Powered by vBulletin® Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright 2004-2024 ExcelBanter.
The comments are property of their posters.
 

About Us

"It's about Microsoft Excel"