WebsiteTemplate/cheatsheets/wget.html
2026-01-25 11:33:37 -04:00

320 lines
9.4 KiB
HTML

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="Content-Security-Policy" content="default-src 'self'; script-src 'self'; style-src 'self'; font-src 'self' data:; img-src 'self' data:; connect-src 'self'; base-uri 'self'; form-action 'self' https://defcon.social https://bsky.app;">
<meta http-equiv="X-Content-Type-Options" content="nosniff">
<link rel="stylesheet" href="../assets/css/style.css">
<link rel="icon" type="image/x-icon" href="../favicon.ico">
<script>
// Apply theme immediately to prevent flash
(function() {
const theme = localStorage.getItem('theme') ||
(window.matchMedia && window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light');
document.documentElement.setAttribute('data-theme', theme);
})();
</script>
<title>wget Cheatsheet - Cheatsheets - Launch Pad</title>
</head>
<body>
<button class="theme-toggle" id="themeToggle" aria-label="Toggle dark mode">
<svg class="theme-icon theme-icon-moon" xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M21 12.79A9 9 0 1 1 11.21 3 7 7 0 0 0 21 12.79z"></path></svg>
<svg class="theme-icon theme-icon-sun" xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" style="display: none;"><circle cx="12" cy="12" r="5"></circle><line x1="12" y1="1" x2="12" y2="3"></line><line x1="12" y1="21" x2="12" y2="23"></line><line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line><line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line><line x1="1" y1="12" x2="3" y2="12"></line><line x1="21" y1="12" x2="23" y2="12"></line><line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line><line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line></svg>
</button>
<br/><br/>
<div class="name">
__ _______________________ _________._________________________
\_ _____/ \______ \ / _ \ / _____/ / _____/ | | \_ _____/
| __) | _/ / /_\ \ / \ ___ / \ ___ | | | __)_
| \ | | \ / | \ \ \_\ \ \ \_\ \ | |___ | \
\___ / |____|_ / \____|__ / \______ / \______ / |_______ \ /_______ /
\/ \/ \/ \/ \/ \/ \/
</div>
<div class="blog-page-header">
<div class="blog-header-content">
<a href="/cheatsheets" class="back-link" title="Back to Cheatsheets">
<svg xmlns="http://www.w3.org/2000/svg" width="42" height="42" viewBox="0 0 24 24" class="home-icon"><path fill="currentColor" d="M10 20v-6h4v6h5v-8h3L12 3 2 12h3v8z"/></svg>
</a>
<h1 class="blog-page-title">wget Cheatsheet</h1>
</div>
</div>
<div class="blog-post-container">
<div class="blog-posts-container" style="max-width: 900px; margin: 0 auto;">
<div class="blog-post">
<div class="blog-post-content">
<p><a href="index.html">← Back to cheatsheets</a></p>
<p><a href="../index.html">← Home</a></p>
<hr>
<p>wget is a non-interactive network downloader. It supports HTTP, HTTPS, and FTP. Great for downloading files, mirroring websites, and recursive downloads.</p>
<hr>
<h2>Basic Usage</h2>
<ul>
<li>wget &lt;url&gt; - Download file</li>
</ul>
<ul>
<li>wget -O filename &lt;url&gt; - Save with custom name</li>
</ul>
<ul>
<li>wget -P /path/ &lt;url&gt; - Save to directory</li>
</ul>
<hr>
<h2>Download Options</h2>
<ul>
<li>-O filename - Output filename</li>
</ul>
<ul>
<li>-P prefix - Directory prefix</li>
</ul>
<ul>
<li>-c - Continue partial download</li>
</ul>
<ul>
<li>-N - Timestamping (only newer)</li>
</ul>
<ul>
<li>-nc - No clobber (skip existing)</li>
</ul>
<ul>
<li>-b - Background download</li>
</ul>
<ul>
<li>-q - Quiet mode</li>
</ul>
<ul>
<li>-v - Verbose</li>
</ul>
<hr>
<h2>Recursive Downloads</h2>
<ul>
<li>-r - Recursive download</li>
</ul>
<ul>
<li>-l depth - Recursion depth (default 5)</li>
</ul>
<ul>
<li>-k - Convert links for local viewing</li>
</ul>
<ul>
<li>-p - Download page requisites</li>
</ul>
<ul>
<li>-E - Save HTML as .html</li>
</ul>
<ul>
<li>-np - No parent (don't go up)</li>
</ul>
<ul>
<li>-nH - No host directories</li>
</ul>
<ul>
<li>--cut-dirs=N - Ignore N directories</li>
</ul>
<hr>
<h2>Filtering</h2>
<ul>
<li>-A ext - Accept extensions (jpg,png)</li>
</ul>
<ul>
<li>-R ext - Reject extensions</li>
</ul>
<ul>
<li>-I list - Include directories</li>
</ul>
<ul>
<li>-X list - Exclude directories</li>
</ul>
<ul>
<li>--accept-regex - Accept regex pattern</li>
</ul>
<ul>
<li>--reject-regex - Reject regex pattern</li>
</ul>
<hr>
<h2>HTTP Options</h2>
<ul>
<li>--header="Header: Value" - Custom header</li>
</ul>
<ul>
<li>-U agent - User agent</li>
</ul>
<ul>
<li>--referer=url - Set referer</li>
</ul>
<ul>
<li>--post-data="data" - POST data</li>
</ul>
<ul>
<li>--post-file=file - POST from file</li>
</ul>
<ul>
<li>--method=METHOD - HTTP method</li>
</ul>
<hr>
<h2>Authentication</h2>
<ul>
<li>--user=user - Username</li>
</ul>
<ul>
<li>--password=pass - Password</li>
</ul>
<ul>
<li>--ask-password - Prompt for password</li>
</ul>
<ul>
<li>--http-user=user - HTTP auth user</li>
</ul>
<ul>
<li>--http-password=pass - HTTP auth pass</li>
</ul>
<hr>
<h2>Cookies</h2>
<ul>
<li>--save-cookies file - Save cookies</li>
</ul>
<ul>
<li>--load-cookies file - Load cookies</li>
</ul>
<ul>
<li>--keep-session-cookies - Keep session cookies</li>
</ul>
<hr>
<h2>SSL/TLS</h2>
<ul>
<li>--no-check-certificate - Ignore SSL errors</li>
</ul>
<ul>
<li>--ca-certificate=file - CA cert file</li>
</ul>
<ul>
<li>--certificate=file - Client cert</li>
</ul>
<ul>
<li>--private-key=file - Private key</li>
</ul>
<hr>
<h2>Proxy</h2>
<ul>
<li>-e use_proxy=yes - Enable proxy</li>
</ul>
<ul>
<li>-e http_proxy=url - HTTP proxy</li>
</ul>
<ul>
<li>-e https_proxy=url - HTTPS proxy</li>
</ul>
<ul>
<li>--no-proxy - Disable proxy</li>
</ul>
<hr>
<h2>Speed & Limits</h2>
<ul>
<li>--limit-rate=rate - Limit download speed</li>
</ul>
<ul>
<li>-w seconds - Wait between requests</li>
</ul>
<ul>
<li>--random-wait - Random wait (0.5-1.5x)</li>
</ul>
<ul>
<li>-t tries - Retry count (0=infinite)</li>
</ul>
<ul>
<li>-T timeout - Timeout seconds</li>
</ul>
<ul>
<li>--dns-timeout - DNS timeout</li>
</ul>
<hr>
<h2>Common Examples</h2>
<h3>Simple Download</h3>
<pre><code>wget https://example.com/file.zip</code></pre>
<p>Download single file.</p>
<h3>Resume Download</h3>
<pre><code>wget -c https://example.com/large-file.iso</code></pre>
<p>Continue interrupted download.</p>
<h3>Download to Directory</h3>
<pre><code>wget -P ~/Downloads https://example.com/file.zip</code></pre>
<p>Save to specific directory.</p>
<h3>Mirror Website</h3>
<pre><code>wget -m -k -p -E -np https://example.com</code></pre>
<p>Create offline mirror of site.</p>
<h3>Download All Images</h3>
<pre><code>wget -r -A jpg,jpeg,png,gif -np https://example.com/images/</code></pre>
<p>Download only images recursively.</p>
<h3>Background Download</h3>
<pre><code>wget -b https://example.com/large-file.iso</code></pre>
<p>Download in background.</p>
<h3>Rate Limited</h3>
<pre><code>wget --limit-rate=200k https://example.com/file.zip</code></pre>
<p>Limit to 200KB/s.</p>
<h3>With Authentication</h3>
<pre><code>wget --user=admin --password=secret https://example.com/protected/</code></pre>
<p>Download with credentials.</p>
<h3>From File List</h3>
<pre><code>wget -i urls.txt</code></pre>
<p>Download URLs from file.</p>
<h3>Spider Mode</h3>
<pre><code>wget --spider https://example.com/file.zip</code></pre>
<p>Check if file exists (no download).</p>
<h3>Ignore Robots</h3>
<pre><code>wget -e robots=off -r https://example.com</code></pre>
<p>Ignore robots.txt.</p>
<h3>Custom User Agent</h3>
<pre><code>wget -U "Mozilla/5.0 (Windows NT 10.0)" https://example.com</code></pre>
<p>Spoof browser user agent.</p>
<hr>
<h2>Mirror Options Explained</h2>
<pre><code>wget -m -k -p -E -np https://example.com</code></pre>
<ul>
<li>-m (--mirror) - Mirror mode (-r -N -l inf --no-remove-listing)</li>
<li>-k (--convert-links) - Convert for local viewing</li>
<li>-p (--page-requisites) - Get images, CSS, JS</li>
<li>-E (--adjust-extension) - Save as .html</li>
<li>-np (--no-parent) - Don't go to parent directory</li>
</ul>
<hr>
<h2>wget vs curl</h2>
<ul>
<li>wget - Better for downloads, recursive, mirroring</li>
<li>curl - Better for API calls, more protocols, more flexible</li>
</ul>
<hr>
<h2>Tips</h2>
<ul>
<li>Use -c to resume interrupted downloads</li>
</ul>
<ul>
<li>Use -b for large files (runs in background)</li>
</ul>
<ul>
<li>Use --limit-rate to be polite to servers</li>
</ul>
<ul>
<li>Use -np to stay within target directory</li>
</ul>
<ul>
<li>Use --random-wait to avoid being blocked</li>
</ul>
<ul>
<li>Check wget log with tail -f wget-log</li>
</ul>
<ul>
<li>Great for scripting and automation</li>
</ul>
<ul>
<li>Be respectful when mirroring sites</li>
</ul>
<hr>
<p><a href="index.html">← Back to cheatsheets</a></p>
<p><a href="../index.html">← Home</a></p>
</div>
</div>
</div>
</div>
<script async type="text/javascript" src="../blog/analytics.js"></script>
<script src="../theme.js"></script>
</body>
</html>