2025-03-14 21:59:59 -06:00
|
|
|
# htmlq
|
|
|
|
|
|
|
|
|
|
> Use CSS selectors to extract content from HTML files.
|
2026-02-18 06:55:01 -07:00
|
|
|
> More information: <https://github.com/mgdm/htmlq#usage>.
|
2025-03-14 21:59:59 -06:00
|
|
|
|
|
|
|
|
- Return all elements of class `card`:
|
|
|
|
|
|
|
|
|
|
`cat {{path/to/file.html}} | htmlq '.card'`
|
|
|
|
|
|
|
|
|
|
- Get the text content of the first paragraph:
|
|
|
|
|
|
2026-02-18 06:55:01 -07:00
|
|
|
`cat {{path/to/file.html}} | htmlq {{[-t|--text]}} 'p:first-of-type'`
|
2025-03-14 21:59:59 -06:00
|
|
|
|
|
|
|
|
- Find all the links in a page:
|
|
|
|
|
|
2026-02-18 06:55:01 -07:00
|
|
|
`cat {{path/to/file.html}} | htmlq {{[-a|--attribute]}} href 'a'`
|
2025-03-14 21:59:59 -06:00
|
|
|
|
|
|
|
|
- Remove all images and SVGs from a page:
|
|
|
|
|
|
2026-02-18 06:55:01 -07:00
|
|
|
`cat {{path/to/file.html}} | htmlq {{[-r|--remove-nodes]}} 'img' {{[-r|--remove-nodes]}} 'svg'`
|
2025-03-14 21:59:59 -06:00
|
|
|
|
|
|
|
|
- Pretty print and write the output to a file:
|
|
|
|
|
|
2026-02-18 06:55:01 -07:00
|
|
|
`htmlq {{[-p|--pretty]}} {{[-f|--filename]}} {{path/to/input.html}} {{[-o|--output]}} {{path/to/output.html}}`
|