Month: April 2020

Ubuntu 20.04 安装Docker CE

主要步骤:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu eoan stable"

参考资料:

  • https://www.fosslinux.com/14228/how-to-install-and-configure-docker-ce-on-ubuntu-18-04-lts.htm

阿里云文档爬虫

按产品下载

import requests
import re
import os
import time
from bs4 import BeautifulSoup

def GetPage(url):
    page = requests.get(url)
    html = page.text
    return html

html_doc = GetPage('https://www.alibabacloud.com/help/zh')

# 生成产品列表链接

soup = BeautifulSoup(html_doc, 'html.parser')
index_url = []
baseurl = 'https://www.alibabacloud.com'

for k in soup.find_all(href=re.compile("product")):
    index_url.append(baseurl + k.get('href'))


def download_pdf(produturl):

    html_doc = GetPage(produturl)
    soup = BeautifulSoup(html_doc, 'html.parser')

    def get_product_name():
        """ 获得产品名 例如:云服务器_ECS """
        product_names = soup.find('h3', class_="product-name")
        for name in product_names.strings:
            product_name = name

        return product_name.replace(' ', '')

    def get_pdf_title():
        """ 获得PDF标题 例如:新功能发布记录 """
        product_names = soup.find(class_="download-pdf")
        # print(product_names.parent)
        product_feature = product_names.parent
        pdf_title = product_feature.h3.string

        return pdf_title.replace(' ', '')

    def get_pdfs_file():
        pdf_names = soup.find_all(class_="download-pdf")
        for pdfs in pdf_names:
            product_name = get_product_name()
            product_feature = pdfs.parent
            pdf_title = product_feature.h3.string
            pdf_url_tag = product_feature.a
            pdf_filename = product_name + '_' + pdf_title.replace(' ', '') + '.pdf'
            pdf_url_str = str(pdf_url_tag.get('href'))

            if pdf_url_str.startswith('//'):
                pdf_url_str = ('http:' + pdf_url_str)

            if not os.path.exists(product_name):
                os.makedirs(product_name)

            # print(pdf_filename)
            # print(pdf_url_str)

            wget_cmd = 'wget ' + pdf_url_str + ' -O "' + product_name + '/' + pdf_filename + '"'
            print(wget_cmd)
            os.system(wget_cmd)
            time.sleep(1)

    get_pdfs_file()

# 执行测试

for products in index_url:
    download_pdf(products)
    time.sleep(1)

# download_pdf('https://www.alibabacloud.com/help/zh/product/147291.htm')

产品太多,按类别移动至文件夹

import requests
import re
import os
import time
from bs4 import BeautifulSoup

def GetPage(url):
    page = requests.get(url)
    html = page.text
    return html

html_doc = GetPage('https://www.alibabacloud.com/help/zh')

soup = BeautifulSoup(html_doc, 'html.parser')
keys = soup.find_all(class_="masonry-list")

for key in keys:
    category_dl = key.parent
    category = category_dl.dt.h2.string     # 产品类别

    product_code = key.find('a')
    product_text = product_code.string
    product = product_text.replace(' ', '')     # 产品名称

    if not os.path.exists(category):
        os.makedirs(category)

    cmd = 'mv ' + product + ' ' + category
    os.system(cmd)

Ulysses 语雀导入脚本

2020-03-26 11:59:15

可以处理标题中的空格。暂不能处理带有空格的文件名,以及tiff和png文件同名的情况,如果有同名转换将出现冲突,丢失图片。文件重命名的逻辑需要再写。

解决思路:遍历md文件,取出所有图片文件名,经过解码后得到一个文件名列表,对图片文件进行随机命名,然后将文件名写回md文件。

加入tiff2png检测,如检测不存在程序退出,并提示运行brew安装。

文件名空格处理思路。

https://stackoverflow.com/questions/2709458/how-to-replace-spaces-in-file-names-using-a-bash-script

文件名容易处理,麻烦的是md文件中文件链接中含有空格的处理。简单的查找替换可能会引起其他问题。

#/bin/bash

if [ -f "index.md" ]; then
    echo "converting..."
else 
    exit 1
fi

# 转换tiff图片为png
tiff2png -compression 5 *.tiff
rm -f *.tiff

# 正则修改正文
sed -i '' 's/\[\](/[img](\//g; s/\.tiff/\.png/g' index.md

# 获取文章标题
title=$(head -n 1 index.md | sed 's/#* //')

# 生成目录 SUMMARY.md
cat << EOF > SUMMARY.md
# Summary

* [$title](README.md)
EOF

# 更改文件名
mv index.md README.md

# 打包zip文件
zip -urq yuque_gitbook_import.zip . -x '*_book*' -x '*.DS_Store*'

How to Add User to Sudoers in Ubuntu

https://linuxize.com/post/how-to-add-user-to-sudoers-in-ubuntu/

echo "ubuntu  ALL=(ALL) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/ubuntu

Protected: 自建梯子笔记

This content is password protected. To view it please enter your password below:

Powered by WordPress & Theme by Anders Norén