webscraperhelper: Generating SXPath Queries from SXML Examples
(require webscraperhelper) | package: webscraperhelper |
1 Introduction
(define doc '(*TOP* (html (head (title "My Title")) (body (@ (bgcolor "white")) (p "Summary: This is a document.") (div (@ (id "ResultsSection")) (h2 "Results") (p "These are the results.") (table (@ (id "ResultTable")) (tr (td (b "Input:")) (td "2 + 2")) (tr (td (b "Output:")) (td "Four"))) (p "Lookin' good!"))))))
> (webscraperhelper '(td "Four") doc)
Absolute SXPath: (html body div table (tr 2) (td 2)) Absolute SXPath with IDs: (html body (div (@ (equal? (id "ResultsSection")))) (table (@ (equal? (id "ResultTable")))) (tr 2) (td 2)) Relative SXPath with IDs: (// (table (@ (equal? (id "ResultTable")))) (tr 2) (td 2))
> (define query (sxpath '(// (table (@ (equal? (id "ResultTable")))) (tr 2) (td 2))))
> (query doc) ((td "Four"))
Webscraperhelper
helps a programmer
scrape the
Web a great deal!
2 Interactive Interface
procedure
goal : any/c sxml : sxml? ids : (listof symbol?) = '(id)
3 Programmatic Interface
procedure
path : any/c (wsh-path->sxpath-absids+relids path) → any path : any/c (wsh-path->sxpath-abs+absids+relids path) → any path : any/c
4 History
- Version 2:0 —
2016-02-28 Moving from PLaneT to new package system.
- Version 1:2 —
2009-03-14 Minor documentation change.
- Version 1:1 —
2009-02-24 License now LGPL 3.
Converted to author’s new Scheme administration system.
- Version 1:0 —
2005-07-04 Documentation update, plus get it into PLaneT 299/3xx.
- Version 0.2 —
2004-08-16 Corrected typographical error in attributions.
- Version 0.1 —
2004-07-31 Initial version.
5 Legal
Copyright 2004, 2005, 2009, 2016 Neil Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author.