2020年12月27日 星期日

DPM

DPM

DPM troubleshooting

這篇文章記錄一些DPM的筆記

xrdcp時permission denied

/var/log/xrootd/dpmdisk/xrootd.log

190219 06:59:31 22192 secgsi_GetSrvCertEnt: failed to load certificate for the issuing CA 'b459ca48.0|9cd75e87.0

/etc/grid-security/certificates錯誤,從別台複製過來可解決
也有可能是sharekey不一致,可用md5sum確認
或是:disk-node 的 /data01權限錯誤

DN has NOT been authorized

dmlite.log:
dmlite dome processreq : DN '/C=TW/O=AS/OU=GRID/CN=cdvm2.twgrid.org' has NOT been authorized.
這其實不是問題,進入dmlite-shellfsadd可解決

Host is not trusted, identity provided was (ID,“dpmmgr”)

/etc/hosts中需要新增

Could not connect! Can’t connect to local MySQL server through socket

這是在disknode上發生的錯誤
當時我正要把原本作為headnodecdvm2轉為disknode,卻忘記清空/etc/dmlite.conf.d/mysql.cfg
刪除該檔案後即可正常運作

有時可以有時不行

  • dpm-rmfs 看看
  • disk node 權限設定有誤

Permission refused

通常這時只差臨門一腳,可以試試看不加–safe的情況下重做mapfile

Hostname和cert不一樣

設定DNS或/etc/hosts

Cannot verify AC signature! (or permission denied at gridftp)

沒用到好的proxy
or
/etc/grid-security/vomsdir/沒設定好

DOME

  • 記得要upgrade mysql server
  • /etc/xrootd/xrootd-dpmredir|xrootd-dpmdisk要開httpd,並安裝缺少的檔案
  • /etc/dmlite.conf.ddomeadapter.conf設定好,安裝缺少的檔案,並移除adapter.conf
  • /etc/domehead.conf|domedisk.conf設定好
  • /etc/sysconfig/dpminfo設定好

dmlite.log出現rfio

檢查前一項,例如以下log:

1.7.1548810776 at #012[bt]: (3) /usr/lib64/dmlite/plugin_adapter.so : dmlite::StdRFIOHandler::StdRFIOHandler(std::string const&, int, unsigned int)+0x620 [0x7fefcc577700]#012[bt]: (4) /usr/lib64/dmlite/plugin_adapter.so : dmlite::StdRFIODriver::createIOHandler(std::string const&, int, dmlite::Extensible const&, unsigned int)+0x1be [0x7fefcc577aae]#012[bt]: (5) /lib64/libdmlite.so.0 : dmlite_fopen+0x1a1 [0x7fefd3846f41]#012[bt]: (6) /usr/lib64/httpd/modules/mod_lcgdm_disk.so : +0x658c [0x7fefd043058c]

grep -r "/usr/lib64/dmlite/plugin_adapter.so"找誰用了它,照理來說應該用/usr/lib64/dmlite/plugin_domeadapter.so才對,此時推估是httpd出錯,找到後修正。

Permission refuse

voms或cert或mapfile沒設定好

permission denied

目錄權限設定錯誤,應該用dmlite去設定它(例如mkdir, quotatokenset)

HTTP 409 : Conflict, File Exist

沒有設定好quotatoken
或是,ipv6沒設定好(davix failed)

No space left on device

quotatoken size太小,用quotatokenmod <quotatoken id> path <path> pool <pool> size <x GB>修改
記得是GB,不是G

Failed at step RUNTIME_DIRECTORY spawning /usr/bin/xrootd: File exists

/usr/lib/tmpfiles.d/xrootd.conf設定錯誤,把裡面的xrootd改成dpmmgr即可

機率性失敗

可能是某個node壞掉,進入該node查詢

Error: Could not connect to server

防火牆沒設好

Error when parsing json

-[#00.000022] Error when parsing json response: {
}
沒有pool

Error: HTTP 400 : Server Error

server端 error (/var/log/httpd/ssl_error_log): AH01964
可能是hostname有問題(不能包含底線)
需要修改LogLevel (/etc/httpd/conf/httpd.conf and /etc/httpd/conf.d/zlcgdm-dav.conf)

Missing token on pfn: /domehead/command/dome_getidmap

檢查/etc/dmlite.conf.d/domeadapter.conf之類的檔案,headnode port可能沒設定,導致data接到了httpd而非xrootd
會連帶發生[3005] Unable to close file ; invalid argument dpm cern錯誤

fd is a NULL pointer at , Could not open…

可能是selinux沒關,請檢查

Cannot stat lfn: ‘/dpm/twgrid.org’ err: 2 what: ‘[#00.000002] Entry ‘twgrid.org’ not found under ‘/dpm’’ and no volatile filesystem matches.

可能是mariadb沒關

SSL handshake failed: Connection timed out during SSL handshake’. No response to show

mtu => 1500?
事後改回9000就可以通了,莫名其妙

xrootd 開不起來

  • 檢查tmpfs的權限
  • 可能是Plugin version XrdOss v4.8.5 is incompatible with XrdDPMOss v4.9.0 (must be <= 4.8.x) in osslib libXrdDPMOss.so-4.3

httpd Unregistered Authentication Agent for unix-process:1669:9621 (system bus name :1.20, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)

可能是ca沒裝
安裝方式:https://wiki.egi.eu/wiki/EGI_IGTF_Release

xrootd啟動失敗

190412 08:52:45 13452 XrdProtocol: Protocol XrdHttp could not be loaded
------ xrootd dpmredir@cddh.twgrid.org:-1 initialization failed.
190412 08:55:55 13487 Starting on Linux 3.10.0-862.14.4.el7.x86_64

可能是mysql user的權限沒設定好

[dmlite/dmlite.log] No useable identity provided

剛裝好時會有這問題,等一陣子後重開即可

gridftp

注意firewall 80/tcp port要開

srmv2.2

DESTINATION SRM_PUT_TURL error on the turl

dmlite log出現莫名DN時
service dpm restart

Error: Error while copying to SFN: f-dpmp23.grid.sinica.edu.tw:/data01/atlas/2019-05-23/dd_50M.1558585079.8.1558591240 with error: Error Contacting the remote disknode

adapter 沒切掉
造成drainfs時錯誤

OPS

20190506

enter image description here

enter image description here

smstor45 high swap usage => RAID card GUI process (use java)

=> systemd-journald => /etc/systemd/journald.conf 全部註解掉

find /proc -maxdepth 2 -path "/proc/[0-9]*/status" -readable -exec awk -v FS=":" '{process[$1]=$2;sub(/^[ \t]+/,"",process[$1]);} END {if(process["VmSwap"] && process["VmSwap"] != "0 kB") printf "%10s %-30s %20s\n",process["Pid"],process["Name"],process["VmSwap"]}' '{}' \;

Token does not validate

Jun 11 06:37:11 f-dpmp1234 httpd[32387]: {139943286126336}!!! dmlite setMessage : DmException(…):[#00.000013] Missing token on pfn: /.noindex.html at #012[bt]: (3) /usr/lib64/dmlite/plugin_domeadapter.so : dmlite::DomeIODriver::createIOHandler(std::string const&, int, dmlite::Extensible const&, unsigned int)+0xae4 [0x7f4717d507f4]#012[bt]: (4) /lib64/libdmlite.so.0 : dmlite_fopen+0x249 [0x7f471e178bb9]#012[bt]: (5) /usr/lib64/httpd/modules/mod_lcgdm_disk.so : +0x70d8 [0x7f471ab400d8]#012[bt]: (6) /usr/lib64/httpd/modules/mod_lcgdm_dav.so : +0x455f [0x7f471e64055f]
是ipv6設定有問題。

如果client會先和header要資料,header會拿client的ip作成token。
如果header和disknode都只有v4,沒問題
如果headerv4,disk v6,會變成拿v4作成的token去跟v6的機器要,會有問題。
把disk的v6 disable掉可解
總之disk和head要一致。

drain時timeout

iptables 80 port 要打開

No useable identity provided

fetch-crl

qryconf 容量不對

hostname有問題

Error: The file is pinned

等幾小時,可能會自動解pinned

install new disk-node

  • dpm setup script ( testing on vm )
  • dome setup script
  • ipv6 setting
    • nmcli con
  • hostname setting
    • hostnamectl
  • zfs
    • iozone
      • download and compile from source
    • list all devices `ls -altrh /dev/disk/by-id/ | grep wwn | grep -v sdbj | awk ‘{print $(NF-2)}’ > /tmp/disk.txt
    • cat /tmp/disk.txt | awk ‘BEGIN{print “zpool create -o ashift=12 data01 \”} NR<57{if((NR-1)%70){printf("%s “, “raidz1”)};printf(”%s ", $0);if(NR%70){if(NR!=56){printf("\\n")}else{printf("\n")}}} NR>56{if(NR==57){printf("zpool add data01 spare “)}printf(”%s ",$0)}’ > ~/cdli/create.sh
    • bash ~/cdli/create.sh
    • parted
    • `zpool add log
    • [root@hpstor14 cdli]# echo “730000000000” > /sys/module/zfs/parameters/zfs_arc_min
      [root@hpstor14 cdli]# echo “760000000000” > /sys/module/zfs/parameters/zfs_arc_max
      [root@hpstor14 cdli]# echo “760000000000” > /sys/module/zfs/parameters/zfs_arc_meta_limit
      [root@hpstor14 cdli]# echo “730000000000” > /sys/module/zfs/parameters/zfs_arc_meta_min
      [root@hpstor14 cdli]# zfs set atime=off data01
      [root@hpstor14 cdli]# zfs set compression=lz4 data01
      [root@hpstor14 cdli]# zfs set mountpoint=/data01 data01

drain 沒反應

-Parameter(s): hpstor11.grid.sinica.edu.tw, /data01, dryrun, false
重開rfiod,shift.conf要記得上!
dpm 要重開!

‘NSS: private key from file not found’

好像放一段時間就好了?等吧…
除此之外,specify threads = 1可改善

with error: The Headnode Dav server reported error 403 when issuing the copy

fetch-crl
/usr/libexec/edg-mkgridmap/edg-mkgridmap.pl --conf=/etc/lcgdm-mkgridmap.conf --output=/etc/lcgdm-mapfile

Davix

用PROPFIND方法可以找檔案的metadata,內容大致上長這樣

<?xml version="1.0" encoding="utf-8"?>
<D:multistatus xmlns:D="DAV:">
<D:response xmlns:lcgdm="LCGDM:" xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/" xmlns:lp3="LCGDM:">
<D:href>/dpm/grid.sinica.edu.tw/home/atlas/atlasdatadisk/rucio/data17_13TeV/57/40/DAOD_EGAM2.20516709._000082.pool.root.1</D:href>
<D:propstat>
<D:prop>
<lcgdm:checksum.adler32>7569f3d3</lcgdm:checksum.adler32><lp1:resourcetype/>
<lp1:creationdate>2020-02-09T03:14:43Z</lp1:creationdate><lp1:getlastmodified>Sun, 09 Feb 2020 03:14:41 GMT</lp1:getlastmodified><lp3:lastaccessed>Sun, 09 Feb 2020 06:22:13 GMT</lp3:lastaccessed><lp1:getetag>d870f44-5e3f7921</lp1:getetag><lp1:getcontentlength>115902905</lp1:getcontentlength><lp1:displayname>DAOD_EGAM2.20516709._000082.pool.root.1</lp1:displayname><lp1:getcontenttype>application/x-troff-man</lp1:getcontenttype><lp1:executable>F</lp1:executable><lp2:executable>F</lp2:executable><lp1:iscollection>0</lp1:iscollection><lp3:guid></lp3:guid><lp3:mode>0100664</lp3:mode><lp3:sumtype>AD</lp3:sumtype><lp3:sumvalue>7569f3d3</lp3:sumvalue><lp3:fileid>226955076</lp3:fileid><lp3:status></lp3:status><lp3:xattr>{"checksum.adler32": "7569f3d3"}</lp3:xattr><lp1:owner>3109</lp1:owner><lp1:group>110</lp1:group></D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
</D:multistatus>

HTTP-tpc not work

timeout => 檢查/var/www/proxycache有沒有被正確開起來

could not connect to remote node

檢查header 和 disk的 /etc/gridftp.conf的設定

Too many connection…

/etc/my.cnf沒有讀到
要在/usr/lib/systemd/system/mariadb.service中加入這行

ExecStart=/usr/bin/mysqld_safe --defaults-file=/etc/my.cnf --basedir=/usr

eymir.dmlite.log:May 11 10:38:08 eymir globus-gridftp-server[720]: {140281692354304}[0] dmlite Memcache MemcacheFactory : MemcacheFactory started.

/etc/dmlite.conf.d/zmemcache.conf被打開了,要清空它。

FTS transfer error

TRANSFER [70] TRANSFER globus_ftp_client: the server responded with an error 500 500-Command failed. : globus_ftp_control_data_write failed. 500-globus_ftp_control_data_write(): Handle not in proper state. PORT 500- 500 End.

可能是ipv6沒設定好,檢查是否都在同一台server上。

查詢某個ip

site對site的查v6
其他查v4,hostname沒有用的

TRANSFER [2] SOURCE SRM_GET_TURL error on the turl request : [SE][StatusOfGetRequest][SRM_INVALID_PATH]

transfer dashboard上有部份錯集中在某些機器上,可能和網路故障、v6故障之類的有關,重開該機器服務可以解決。

網路出問題後srm不能使用

  1. 重開srm機器上的服務
  2. 重開header上的舊服務(dpns & dpm)

2020年12月9日 星期三

關於磨合

關於磨合

這篇文章寫得挺有道理,在此引用一下。

當你意識到對方提出測試了,你根本不用想著通過,因為若你本來就是她要的,那你的真實本質將不證自明,若你不是她要的,那大家合則來,不合則散。

這次的事情之所以會不順利,追根究柢應該是有文章中的心態。為了一個結果而硬是去迎合,忽略了太多事情,希望日後能有所改善。

2020年9月17日 星期四

climb 5.12a

climb 5.12a

原文

感覺很值得參考,目前往11a邁進

擷取General Guidelines段落
Climb

  • Establish a weekly training schedule and stick to it.
    建立每週訓練計畫,並好好執行它

  • Climb 2 to 4 days/week; never more than 2 days in a row.
    每週爬兩到四天,「連續爬」的次數不超過兩次(例如:一、二、四、六)

  • Warm up with light aerobic exercise, dynamic stretching, and easy climbing.
    用輕量級有氧、動態伸展還有簡單的路線熱身

  • Take at least 1 day of total rest each week.
    一週完整休息一次

  • Focus on holds, angles, and moves encountered in Ten Sleep; use your route’s beta to guide training route setting.
    針對目標路線進行研究(這裡的Ten Sleep是一條5.12a的路線)

  • Spend 1 or 2 first-days-on (your first day climbing after a rest) bouldering each week.
    完整休息的第一天,進行抱石訓練

  • Incorporate 4x4 power-endurance training 1x/week. Climb 4 12- to 20-move boulder problems 4 times each, with 1 to 5 minutes of rest between each problem.
    每週進行一次4x4力量耐力訓練,爬12-20個點的抱石路線四條,每條路線爬4次,中間休息1-5分鐘。

  • Incorporate high-intensity endurance training 1x to 2x/ week. Climb 3 to 7 routes with 20 to 25 pumpy moves to a resting hold. Shake out and recover, then climb for another 15 to 20 moves.
    每週進行一到兩次高強度的耐力訓練,攀登3-7條20-25個點的長路線,在牆上休息後一下,再繼續爬(岩館上下攀數次應賅算)

Strength

  • Weight train 2x/week right after climbing or the day after; don’t climb to exhaustion and then weight train.
    在攀岩後或隔天進行重量訓練,每週兩次,盡可能避免爬到力竭

  • Rest 2 days between each weight session.
    重量訓練間休息兩天

  • Day 1: Pull-ups/lat pull-downs; tricep push-downs/dips; rows; wrist curls; reverse wrist curls.
    Day1: 引體向上/背部下拉(訓練背肌) ;下推(訓練三頭肌); 腕部捲曲訓練、反向腕部捲曲訓練(訓練前手臂肌肉)

  • Day 2: Deadlifts; squats; bench presses/push-ups; military presses; captain’s chair leg lifts.
    硬舉(下背部、臀部肌肉);握推/伏地挺身(胸肌);槓鈴上舉;captain’s chair leg lifts(核心)

  • Rest 3 minutes between sets of same exercise.
    組間休息三分鐘

  • Do weighted finger curls 1x to 2x/week. See “Efficient Finger Training” sidebar.
    每週訓練兩次手指強度

  • The last week of each month, do 1 set (lighter reps) of each exercise, and take extra rest days.

2020年5月22日 星期五

陽明大學米卡茲早餐店

炸雞塊特別好吃
店員女兒在椅子上睡得很香很可愛

2020年5月19日 星期二

ssh免密碼登入

ssh免密碼登入

SSH免密碼登入設定小細節

關於免密碼登入這件事網路上很多教學了,但有時照著做,伺服器仍會跟你要密碼,這是因為有些權限沒有設定好的關係。

根據sshd的man page,這裡有提到一些重點:

 ~/.ssh/
     This directory is the default location for	all user-specific con-
     figuration	and authentication information.	 There is no general
     requirement to keep the entire contents of	this directory secret,
     but the recommended permissions are read/write/execute for	the
     user, and not accessible by others.

 ~/.ssh/authorized_keys
     Lists the public keys (DSA, ECDSA,	Ed25519, RSA) that can be used
     for logging in as this user.  The format of this file is de-
     scribed above.  The content of the	file is	not highly sensitive,
     but the recommended permissions are read/write for	the user, and
     not accessible by others.

     If	this file, the _~/.ssh_ directory, or the	user's home directory
     are writable by other users, then the file	could be modified or
     replaced by unauthorized users.  In this case, **sshd** will not al-
     low it to be used unless the **StrictModes** option has been set to
     "no".

也就是說,ssh相關的目錄要做好以下設定才行:

$HOME/.ssh/ 權限要設定成700
$HOME/.ssh/authorized_keys權限設定成600
$HOME 預設的設定通常正確,但要留意不能讓其他user能寫

如果違背以上限制,還是會被要密碼的~

2020年3月11日 星期三

python tricks

python tricks

Python ticks quick reference

pdb/ipdb

https://docs.python.org/3/library/pdb.html
https://github.com/gotcha/ipdb
pdb

import pdb; pdb.set_trace()

ipdb

import ipdb
ipdb.set_trace()

list comprehension

>> [i for i in range(10) if i != 5]

output

[0, 1, 2, 3, 4, 6, 7, 8, 9]

列出object properties

>> vars(<object>)

f-string

PEP 498

>> a = 1234
>> print(f"a = {a}")
a = 1234

yield - send

asyncio

PEP 492
https://docs.python.org/3/library/asyncio-task.html#coroutines
simple example

async def main():
    task1 = asyncio.create_task(
        say_after(1, 'hello'))

    task2 = asyncio.create_task(
        say_after(2, 'world'))

    print(f"started at {time.strftime('%X')}")

    # Wait until both tasks are completed (should take
    # around 2 seconds.)
    await task1
    await task2

    print(f"finished at {time.strftime('%X')}")

Running Tasks Concurrently

import asyncio

async def factorial(name, number):
    f = 1
    for i in range(2, number + 1):
        print(f"Task {name}: Compute factorial({i})...")
        await asyncio.sleep(1)
        f *= i
    print(f"Task {name}: factorial({number}) = {f}")

async def main():
    # Schedule three calls *concurrently*:
    await asyncio.gather(
        factorial("A", 2),
        factorial("B", 3),
        factorial("C", 4),
    )

asyncio.run(main())

# Expected output:
#
#     Task A: Compute factorial(2)...
#     Task B: Compute factorial(2)...
#     Task C: Compute factorial(2)...
#     Task A: factorial(2) = 2
#     Task B: Compute factorial(3)...
#     Task C: Compute factorial(3)...
#     Task B: factorial(3) = 6
#     Task C: Compute factorial(4)...
#     Task C: factorial(4) = 24

multiprocessing

multithreading

concurrent

Encoding declarations

https://docs.python.org/3.6/reference/lexical_analysis.html#encoding-declarations
Put the follow comment in the first or second line of the Python script:
-*- coding: <encoding-name> -*-

取得環境變數

import os
os.environ.get('X509_USER_PROXY')

min

這是特別的用法

>>> a = [{"test":{"val1":1}, "name":"A"}, {"test":{"val1":2}, "name": "B"}]
>>> min(a, key=lambda k: k["test"]["val1"])
{'test': {'val1': 1}, 'name': 'A'}

zip

>>> list(zip([1, 2, 3], [4, 5, 6]))
[(1, 4), (2, 5), (3, 6)]

enumerate

>>> list(enumerate(a))
[(0, 'a'), (1, 'b'), (2, 'c')]

getattr

>>>class  A(object):
> ... bar = 1 
> ... 
>>> a = A() 
>>> getattr(a, 'bar')
1

用在自動取得attr時可能會很方便

2020年1月10日 星期五

cvmfs note

cvmfs TS

@ WN
systemctl restart autofs
cvmfs_config wipecache (or remove /var/cache/cvmfs2/shared)
cvmfs_config probe

@ cvmfs server
cvmfs_server resign -d 120

  • /tmp/cvmfs.log
  • /etc/cvmfs/default.local

2020年1月2日 星期四

javascript

javascript

javascript notes

var that = this

如果在object中執行ajax,想在.done(或success)中設定好Object的property,那可能會需要使用此技巧。

例:

class testObj{
    constructor(){
        var that = this
        $.ajax({
    	    url: "someurl...",
	    type: "GET",
	    contentType: "json"
        }).done(function(data){
	    console.log(data)
	    that.hostname = data.info.hostname
        })
    }
};
a = new testObj()
console.log(a.hostname)

看起來感覺很糟,除此之外尚有另外的解法:


class testObj{
    constructor(){
        $.ajax({
    	    url: "someurl...",
	    type: "GET",
	    contentType: "json",
	    context: this
        }).done(function(data){
	    console.log(data)
	    this.hostname = data.info.hostname
        })
    }
};
a = new testObj()
console.log(a.hostname)

嗯…還是醜醜的
javascript要寫得好看不容易呀

解構子

可以拿來組合兩個array

example

a = [1, 2, 3]
b = [4, 5, 6]
console.log([...a, ...b])
//except: [1, 2, 3, 4, 5, 6]

response json

response json

response json 的格式

Error handling

參考Alteditor

response json error的標準格式可能如下:

{
	"responseJSON": {
		"errors": {
			"error1": "error1 message",
			"error2": "error2 message"
		}
	}
}

如果以後有機會自己寫後端,應記住此原則。