.. | ||
.gitignore | ||
.travis.yml | ||
LICENSE | ||
query.go | ||
README.md |
htmlquery
Overview
htmlquery is an XPath query package for HTML, lets you extract data or evaluate from HTML documents by an XPath expression.
Changelogs
2019-02-04
- #7 Removed deprecated
FindEach()
andFindEachWithBreak()
methods.
2018-12-28
- Avoid adding duplicate elements to list for
Find()
method. #6
Installation
$ go get github.com/antchfx/htmlquery
Getting Started
Load HTML document from URL.
doc, err := htmlquery.LoadURL("http://example.com/")
Load HTML document from string.
s := `<html>....</html>`
doc, err := htmlquery.Parse(strings.NewReader(s))
Find all A elements.
list := htmlquery.Find(doc, "//a")
Find all A elements that have href
attribute.
list := range htmlquery.Find(doc, "//a[@href]")
Find all A elements and only get href
attribute self.
list := range htmlquery.Find(doc, "//a/@href")
Find the third A element.
a := htmlquery.FindOne(doc, "//a[3]")
Evaluate the number of all IMG element.
expr, _ := xpath.Compile("count(//img)")
v := expr.Evaluate(htmlquery.CreateXPathNavigator(doc)).(float64)
fmt.Printf("total count is %f", v)
Quick Tutorial
func main() {
doc, err := htmlquery.LoadURL("https://www.bing.com/search?q=golang")
if err != nil {
panic(err)
}
// Find all news item.
for i, n := range htmlquery.Find(doc, "//ol/li") {
a := htmlquery.FindOne(n, "//a")
fmt.Printf("%d %s(%s)\n", i, htmlquery.InnerText(a), htmlquery.SelectAttr(a, "href"))
}
}
List of supported XPath query packages
Name | Description |
---|---|
htmlquery | XPath query package for the HTML document |
xmlquery | XPath query package for the XML document |
jsonquery | XPath query package for the JSON document |
Questions
Please let me know if you have any questions.