基于 go + xpath 爬虫小案例

Golang
331
0
0
2022-05-15

爬虫步骤

  • 明确目标(确定在哪个网站搜索)
  • 爬(爬下内容)
  • 取(筛选想要的)
  • 处理数据(按照你的想法去处理)

扩展包

go get github.com/antchfx/htmlquery

代码如下

package main

import ("fmt""github.com/antchfx/htmlquery""strings""sync"
)

const url = "https://learnku.com/go"

var wg sync.WaitGroup

func ParseEmails() {
    defer wg.Done()
    defer func() {if recover() != nil {
            fmt.Println(recover())}}()
    doc, err := htmlquery.LoadURL(url)if err != nil {panic("解析URL错误")}
    rules := "//span[@class='topic-title']/text()"
    nodes, err := htmlquery.QueryAll(doc, rules)

    if err != nil {panic(`not a valid XPath expression.`)}if len(nodes) == 0 {
        fmt.Println("未找到任何内容")return}//fmt.Printf("%-v\n", nodes)for _, node := range nodes {
        res := htmlquery.InnerText(node)
        resTrim := strings.TrimSpace(res)if resTrim != "" {
            fmt.Printf("parse value == %s\n", resTrim)}}

}

func main() {
    wg.Add(1)
    go ParseEmails()
    wg.Wait()
    fmt.Println("爬虫完成")
}

运行结果

......
parse value == JWT身份认证(附带源码讲解)
parse value == [系列文章] Go 学习笔记 - Go 基础语法(2)
parse value == 第 14 课:并发 concurrency ?《Go 编程基础(视频)》
parse value == 组合函数 Collection《Go 编程实例 Go by Example 2020parse value == 今日面试总结
爬虫完成