전략의 보안과 리눅스

1.
어떤 설명회 자리였습니다. 고객이 질문을 하더군요?

“가상화든 아니든 하드웨어 자원을 제공 받더라고 다른 사람이 접근가능하냐?”
“시스템 관리자가 권한을 관리를 합니다.”
“그러면 내가 개발한 전략이 노출될 위험이 있지않으냐?”
“전략이라고 하지만 C언러로 개발한 후 컴파일한 오브젝트파일이라 알기 힘듭니다.”
“그래도 불안하니까 흔적을 남기지 않는 방법이 없는지? 혹시 메모리를 이용할 수 있나요?”
“음! ?메모리 디스크를 만들면 가능하지만 흔적이 남기 때문에 쉽지 않아보이네요.”

트레이더는 전략을 목숨 보다 중요하게 생각합니다. 마치 자라가 노리는 토끼의 간처럼 꼬옥 숨겨놓기를 바랍니다. ?외부로부터 트레이더의 전략을 보호하기 위한 적절한 보안대책이 ZeroAOS에 필요한 이유입니다. 물론 트레이더가 권한과 책임을 지는 하드웨어를 직접 구매하면, 외부로부터 보호하는 전략을 보호하는 가장 ?좋은 방법입니다. 그렇지만 비용이 발생합니다. 초기 구매비뿐 아니라 지속적인 운영비가 발생합니다. 또한 서버 운용에 익숙하지 않으면 운영위험(Operational Risk)이 높아집니다.위험을 줄이느라 원가가 높아지므로 목표 수익률을 높혀야 합니다. 권한을 조금씩 포기하면 포기할 수록 원가는 낮아집니다. 목표 수익률을 적절히 조절할 수 있습니다.

출발은 하드웨어에 종속적인 고유한 ID를 찾는 일이었습니다. 리눅스의 HostID가 떠올랐습니다. 그런데 문제가 있었습니다. HostID는 변조가 가능하였습니다.

Change Linux Host ID

포기를 하고 다른 대안을 찾아보았습니다. 조사를 해보니 사용할 수 있는 ID가 다음과 같았습니다. Linux가 제공하는 하드웨어 ID 및 소프트웨어 ID들입니다.

먼저 하드웨어 ID들입니다.

1./sys/class/dmi/id/product_uuid: The main board product UUID, as set by the board manufacturer and encoded in the BIOS DMI information. It may be used to identify a mainboard and only the mainboard. It changes when the user replaces the main board. Also, often enough BIOS manufacturers write bogus serials into it. In addition, it is x86-specific. Access for unprivileged users is forbidden. Hence it is of little general use.

2.CPUID/EAX=3 CPU serial number: A CPU UUID, as set by the CPU manufacturer and encoded on the CPU chip. It may be used to identify a CPU and only a CPU. It changes when the user replaces the CPU. Also, most modern CPUs don’t implement this feature anymore, and older computers tend to disable this option by default, controllable via a BIOS Setup option. In addition, it is x86-specific. Hence this too is of little general use.

3./sys/class/net/*/address: One or more network MAC addresses, as set by the network adapter manufacturer and encoded on some network card EEPROM. It changes when the user replaces the network card. Since network cards are optional and there may be more than one the availability if this ID is not guaranteed and you might have more than one to choose from. On virtual machines the MAC addresses tend to be random. This too is hence of little general use.

4./sys/bus/usb/devices/*/serial: Serial numbers of various USB devices, as encoded in the USB device EEPROM. Most devices don’t have a serial number set, and if they have it is often bogus. If the user replaces his USB hardware or plugs it into another machine these IDs may change or appear in other machines. This hence too is of little use.

소프트웨어 ID들도 잘 나와 있습니다. 좀 길지만 한번 읽어두시길 바랍니다. 머리속 어딘가에 저장됩니다.

?/proc/sys/kernel/random/boot_id: A random ID that is regenerated on each boot. As such it can be used to identify the local machine’s current boot. It’s universally available on any recent Linux kernel. It’s a good and safe choice if you need to identify a specific boot on a specific booted kernel.

gethostname(), /proc/sys/kernel/hostname: A non-random ID configured by the administrator to identify a machine in the network. Often this is not set at all or is set to some default value such as localhost and not even unique in the local network. In addition it might change during runtime, for example because it changes based on updated DHCP information. As such it is almost entirely useless for anything but presentation to the user. It has very weak semantics and relies on correct configuration by the administrator. Don’t use this to identify machines in a distributed environment. It won’t work unless centrally administered, which makes it useless in a globalized, mobile world. It has no place in automatically generated filenames that shall be bound to specific hosts. Just don’t use it, please. It’s really not what many people think it is. gethostname() is standardized in POSIX and hence portable to other Unixes.

IP Addresses returned by SIOCGIFCONF or the respective Netlink APIs: These tend to be dynamically assigned and often enough only valid on local networks or even only the local links (i.e. 192.168.x.x style addresses, or even 169.254.x.x/IPv4LL). Unfortunately they hence have little use outside of networking.

gethostid(): Returns a supposedly unique 32-bit identifier for the current machine. The semantics of this is not clear. On most machines this simply returns a value based on a local IPv4 address. On others it is administrator controlled via the /etc/hostid file. Since the semantics of this ID are not clear and most often is just a value based on the IP address it is almost always the wrong choice to use. On top of that 32bit are not particularly a lot. On the other hand this is standardized in POSIX and hence portable to other Unixes. It’s probably best to ignore this value and if people don’t want to ignore it they should probably symlink /etc/hostid to /var/lib/dbus/machine-id or something similar.

/var/lib/dbus/machine-id: An ID identifying a specific Linux/Unix installation. It does not change if hardware is replaced. It is not unreliable in virtualized environments. This value has clear semantics and is considered part of the D-Bus API. It is supposedly globally unique and portable to all systems that have D-Bus. On Linux, it is universally available, given that almost all non-embedded and even a fair share of the embedded machines ship D-Bus now. This is the recommended way to identify a machine, possibly with a fallback to the host name to cover systems that still lack D-Bus. If your application links against libdbus, you may access this ID with dbus_get_local_machine_id(), if not you can read it directly from the file system.

/proc/self/sessionid: An ID identifying a specific Linux login session. This ID is maintained by the kernel and part of the auditing logic. It is uniquely assigned to each login session during a specific system boot, shared by each process of a session, even across su/sudo and cannot be changed by userspace. Unfortunately some distributions have so far failed to set things up properly for this to work (Hey, you, Ubuntu!), and this ID is always (uint32_t) -1 for them. But there’s hope they get this fixed eventually. Nonetheless it is a good choice for a unique session identifier on the local machine and for the current boot. To make this ID globally unique it is best combined with /proc/sys/kernel/random/boot_id.

getuid(): An ID identifying a specific Unix/Linux user. This ID is usually automatically assigned when a user is created. It is not unique across machines and may be reassigned to a different user if the original user was deleted. As such it should be used only locally and with the limited validity in time in mind. To make this ID globally unique it is not sufficient to combine it with /var/lib/dbus/machine-id, because the same ID might be used for a different user that is created later with the same UID. Nonetheless this combination is often good enough. It is available on all POSIX systems.

ID_FS_UUID: an ID that identifies a specific file system in the udev tree. It is not always clear how these serials are generated but this tends to be available on almost all modern disk file systems. It is not available for NFS mounts or virtual file systems. Nonetheless this is often a good way to identify a file system, and in the case of the root directory even an installation. However due to the weakly defined generation semantics the D-Bus machine ID is generally preferrable.

이상의 자료는 SystemD를 열심히 설파하는 Lennart Poettering의 블로그 자료입니다.

On IDs

또하나 검토한 것이 거래계좌번호입니다. 비즈니스 ID입니다. 그렇지만 실거래는 힘들다고 하더라도 전략모듈을 가지고 시험은 가능하다는 반론이 있었습니다. 이 또한 트레이더들은 반대합니다.

2.
기초가 길었습니다. 이중에서 선택을 하여야 합니다. 저는 UUID를 선택하였습니다. 하드웨어 ID중 BIOS정보를 읽어서 만든 product-uuid입니다. UUID가 무엇일까요?

범용 고유번호라고 불리며 128비트의 숫자조합니다. 말 그대로 범용적으로 사용할 수 있는 고유의 ID를 사용하기 위하여 생성되며, 그렇기 때문에 128비트의 Hex조합은 고유하여야 한다. UUID의 구성요소는 다음과 같다.
UUID = time low – time mid – time high and version – clock seq hi and reserved -clock seq low – node

각각의 요소들은 다음의 의미를 갖는다.
time low : 타임 스탬프(시간표시)의 최 하위 32비트 ( 비트0 에서 31)
time mid : 타임 스탬프의 16 비트 중간 필드 (비드 32에서 47 )
time high) and version : 16 비트 값
– 이 값의 최하위 12 비트는 타임 스탬프의 최 상위 12비트( 비트 48에서 59)
– 이 값의 최상위 4비트 버전번호
clock seq hi and reserved : 8비트값
– 이 값의 최하위 6비트는 clock sequence 의 최상위 6비트 (비트 8에서 13)
– 최하위 2비트는 변형 (10의로 set)
clock seq low : clock sequence 의 최하우 비트들
note : 고유 노드 번호

Time stamp 는 Universal Time clock (UTC) 를 사용하고 60 비트로 조정, UTC 를 사용할 수 없는 system 에서 UUID 를 생헝할 경우 중복생성이 가능하므로 이런 경우에는 clock sequence 를 적절히 이용하여 UUID 를 unique 하게 생성한다.
UUID (Universally Unique Identifiers)중에서

또 길었습니다. 결론은 유일하고 복제가 쉽지 않다는 뜻입니다. 이제 product_uuid를 소프트웨어적으로 어떻게 얻느냐가 남았습니다. 여기서 또 복잡한 이슈가 등장하였습니다. root권한입니다.

앞서 자료를 보면 product_uuid는 /sys 디렉토리밑에 있습니다. /sys밑에 만들어진 파일시스템을 ?sysfs라고 합니다. sysfs 파일 시스템은 proc, devfs, devpty 파일 시스템을 하나로 통합한 파일 시스템으로 ?시스템에 연결된 장치와 버스를 하나의 파일 시스템 구조처럼 나타내어 사용자가 액세스 가능하게 합니다. 이전에 /proc/ 파일 시스템에 위치하던 장치 및 드라이버 특정 옵션을 처리하도록 개발되었으며, ?/sys아래에 마운트되어야 합니다. sysfs에 접근하여 데이타를 읽으려면 root 권한이 필요합니다.

다른 방법으로 dmidecode라는 명령어도 있습니다. SMIBIOS정보를 읽어서 보여주는 명령어입니다.

역시나 원하는 정보를 얻으려면 루트 권한이 필요합니다. 어플리케이션이 루트권한을 갖도록 하는 것은 무척이나 위험합니다. 때문에 사용해서는 안되는 방법입니다.

어떻게 하면 알 수 있을까 고민을 하였습니다. 검색하다 보니 HAL 프로젝트가 있더군요.
Architecture
초창기 리눅스를 사용할 때 새로운 하드웨어를 추가하기 무척이나 힘들었습니다. 관련된 드라이버를 구해서 커널이 인식하도록 수작업을 거쳐야 동작을 하였습니다. 어느날부터 그런 일이 없어졌습니다. 일반사용자를 위해 윈도우처럼 쉽게 하드웨어 설치가 가능하도록 하는 기능을 제공하기 시작하였습니다. 이 때 사용하는 개념이 Hardware Abstract Layer입니다. 3com이든 intel이든 네트워크카드라는 추상화된 계층을 두어서 간접적으로 하드웨어를 관리할 수 있도록 하자는 취지로 보입니다. 그래서 HAL Daemon이 있고 D-BUS를 통해 클라이언트와 통신을 하면서 하드웨어와 관련된 기능을 수행합니다. 따라서 HAL과 관련한 프로그램을 이용하며 원하는 정보를 얻을 수 있습니다. 이 때 사용하는 명령어가 hal-get-property입니다.

HAL은 문제가 있습니다. 프로젝트를 중단하였고 udev 프로젝트로 통합한다고 합니다. udev도 비슷한 개념입니다.

libudev and Sysfs Tutorial

udevadm이라는 명령어로 모든 정보를 볼 수 있습니다. UUID를 찾아볼까요?

차이가 보이시나요? root권한이 있을 때와 없을 때 보이는 정보가 다릅니다.

4.
저는 개발할 때 명령어를 이용하여 프로그램을 짜는 것을 선호하지 않습니다. ?하지 말라는 권고를 합니다. 가능하면 HAL이든 UDEV이든 데몬과 통신하여 원하는 값을 얻는 방식이길 바랍니다.

그러면 ZeroAOS는 어떻게 했을까요? 여러분의 상상에 맡깁니다.

2 Comments

지나는 사람 4월 13, 2012 at 1:33 오전

private filesystem과 remote execution을 이용하면 됩니다. 적당한 remote execution을 찾기 어려우면 Dynamic Code Execution (Code 실행기 + 실행코드) 을 직접 구현해서 써도 됩니다.
프로그램소스, 실행파일 모두 로컬서버에 있고..
실행시 remote server에 private filesystem을 만들고 실행파일을 복사해서 실행하거나…
remote server상에 Code 실행기를 실행하고 Code 실행기에 실행코드를 전송해서 실행시키는 방법인데..
자료는 영문으로 적은 키 워드로 검색하면 될 겁니다.

Reply ↓
1. smallake 4월 13, 2012 at 2:19 오전
  
  말씀하신 개념은 이해를 하겠네요. Dynamic Code Execution이나 Private And Encrypted File System도 좋은 방법일 듯 합니다.
  
  다만 트레이더의 근본적인 불안을 풀어줄 수 있을지 의문입니다. “남의 서버에 내 전략을 올려놓으면 불안한데?” 이런 생각때문에 결국 서버를 구매해서 설치하는 결정을 하시더군요.
  
  ZeroAOS의 경우 실행파일은 저희가 개발합니다. 고객은 오브젝트파일만을 정해진 디렉토리에 복사하면 끝인데 이것도 문제라고 하셔서 생각한 방법입니다. 앞서 Encrypted Filesystem은 한번 시험을 해보아야 할 듯 합니다.
  
  속도에 영향이 없을 듯 하지만…
  
  감사합니다.
  
  Reply ↓

이 글 공유하기:

2 Comments

Leave a Comment 응답 취소