concurrency와 parallelism

(*)아래는 전산전공자를 위한 글이 아닙니다.(^^) 저와 같이 전산 언저리에서 전산전공자들과 대화를 하여야 하는 사람을 위한 글입니다. 금융 산업에서도 업무 전문가도 전산에 대한 이해가 깊어야 한다고 생각합니다. 그래야 업무 요구에 부합한 전산 기술을 적용하여 최고의 시스템을 만들어 낼 수 있기때문입니다.

1.
월스트리트 트레이딩시스템을 소개할 때 ‘Parallel’이라는 단어를 자주 봅니다. Low Latency를 구현하는 전략을 다룰 때도 ‘Parallel이’라는 단어를 접합니다.OpenMP, MPICH , OpenCL 혹은 CUDA 프로젝트를 보면서 Thread, Concurrency와 어떤 관계가 있는지 궁금하기 시작했습니다.

일 하면서 Concurrency라는 개념을 처음 접한 때는 94년전후 나우콤 프로젝트를 할 때입니다. 물론 컴퓨터지식은 짧았지만 “다수의 이용자가 사용할 수 있는 시스템을 만들자”는 목표속에서 Concurrent Server Design이라는 과제가 포함되어 있었습니다. 아마 이 때 개발자들이 학습하였던 책이 Richard Stevens의 저서들입니다.

UNIX Network Programming, Volume 2, Second Edition:Interprocess Communications,Prentice Hall(1999)
UNIX Network Programming, Volume 1, Second Edition: Networking APIs: Sockets and XTI,Prentice Hall
TCP/IP Illustrated, Volume 3: TCP for Transactions, HTTP, NNTP,and the UNIX Domain Protocols
TCP/IP Illustrated, Volume 2: The Implementation,Addison-Wesley, 1995
TCP/IP Illustrated, Volume 1: The Protocols,Addison-Wesley, 1994
Advanced Programming in the UNIX Environment,Addison-Wesley, 1992
UNIX Network Programming,Prentice Hall, 1990

대부분 원서들이었습니다. 당시 Stevens가 제안했던 Concurrent 모델은 아래와 같았습니다.

One Fork Per Client Request
Prefork with child calling accept
Prefork with file locking to protect accept
Prefork with thread mutex locking to protect accept
Prefork with parent passing socket descriptor to child
Create one thread per client request
Prethreaded with mutex locking to protect accept
Prethreaded with main thread calling accept

이중에서 클라이언트의 TCP Call을 받아주는 게이트웨이서버는 One Fork Per Client Request을 적용하였고 받은 서비스를 처리하는 서비스서버는 Prefork with parent passing socket descriptor to child을 적용하였습니다. 그리고 나중에 게이트웨어 서버를 재개발할 때 Prethreaded with main thread calling accept로 변경하였습니다.

이 때 주로 사용하였던 서버장비가 SUN Enterprise 3500모델입니다. CPU가 네개입니다. 그렇다고 서버 프로그램을 설계할 때 CPU별로 Task를 정의하진 않았습니다. Process/Thread를 어떤 CPU에서 처리할지는 전적으로 OS(=Solaris)에 의존하였습니다.

이상이 제가 알고 있는 Concurrency입니다. 같은 Task를 하는 Process가 동시에 동작을 하는형식입니다. 물론 Process일 수도 있고 Thread일 수도 있습니다.

2.
Parallell이 전산에서 정확히 무엇을 말하는지 몰라도 단어만의 의미로 보면 Parallel과 Concurrent는 비슷한 뉘앙스를 풍깁니다.

“동시에 무슨 일이 일어나려면 병렬적이 아닌 순차적(Sequence)으로 가능한가?”

그런데 다르다고 합니다. Concurrency와 Parallelism이 어떻게 다른지 알아보았습니다. 어떻게 다른지 트레이딩을 비유해서 정의한 글을 살펴보죠.

Consider the stock market. On any given day there are thousands of traders buying and selling from each other. Each trader is acting broadly sequentially, but the group of them put together are an interacting concurrent system. If you took the same set of traders and the same stock levels and re-ran the system, you would not be surprised to get a different set of trades during the day. The traders are a concurrent system. In contrast, consider the matter of producing statistics about a day’s trading. There might be millions of trades to process, so you may consider splitting the work between several computers. But you would expect that no matter how you divided up the work, you should always get the same result anything else would indicate a bug! The challenge is how to divide up the work to get it done fastest. This is a parallel processing problem.

전산적으로 설명한 글도 살펴보죠. Multicore가 등장하던 때 Linux Magazine에 실린 글이 눈에 들어옵니다.

The terms concurrent and parallel are often used interchangeably, but the two aren’t the same thing. Concurrency is a property of a program or algorithm. If parts of a program can run independently, those parts are considered concurrent. If the independent parts can run on separate processors, then the program can further be called parallel. Admittedly, the distinction between concurrent and parallel is subtle, so it may help to keep these three rules in mind:
1.Concurrency doesn’t necessarily imply parallel execution.
2.Concurrency is a property of the program.
3.Efficient parallel execution of concurrent programs depends on the hardware.
Not all programs can be made concurrent, and it is not necessarily better to run concurrent sections of code in parallel. In some cases, running concurrent portions in parallel may actually slow your application.
A Stroll Down Concurrency Lane? With the introduction of multi-core processors, parallel programmers face a tough decision중에서

위의 정의에서는 Parallelism을 하드웨어와 연관된 의미로 정의하고 있습니다. 왜 그럴까요? Concurrent Computing에 대해 위키페이디아에서 정의한 내용을 살펴보도록 하죠.

Concurrent computing is a form of computing in which programs are designed as collections of interacting computational processes that maybe executed in parallel. (동시에 실행 될 수도 있음을 강조)

Concurrent programs can be executed sequentially on a single processor by interleaving the execution steps of each computational process, or executed in parallel by assigning each computational process to one of a set of processors that may be close or distributed across a network(병행 프로그램은 싱글코어 머신에서는 시분할 방법으로, 겉으로는 동시에 실행되는 것처럼?보이지만, 결국 순차적으로 실행되고, 물리적인 병렬 머신에서는 계산 작업을 분배해 동시에 실행됨을 말하고 있다.)
Concurrent와 Parallel의 차이중에서

Single CPU, Single Core에서도 시분할이라는 방법으로 Parallel과 같은 효과를 거둘 수 있는 Concurrent와 MultiCPU, MultiCore를 기반으로 한 Parallel은 다르다는 의미입니다.

그런데 2005년 MultiCore가 나오기 전에도 Sun이나 IBM의 서버제품들은 Multi-CPU였습니다. 그러면 Parallel한 환경이지 않았을까 하는 의문이 듭니다.

프로세서(CPU)가 하나라면 병행?처리를 순차적으로 할 것이며, 프로세서가 둘 이상이면 병행 처리를 병렬로 처리할 수도 있습니다. 만약 여러개의 작업을 병행 처리를 통해 하나의 프로세서가 순차적으로 진행하게 된다면 A 작업을 조금 하다가 멈추고 B 작업을 진행하는 일이 반복적으로?빠르게 일어나 사용자는 마치 병렬처리되는 듯한 착각을 가질 수 있습니다.

Douglas Eadline은 같은 글에서 다음과 같이 설명합니다.

If you run a multi-tasking operating system on a multi-processor system, it’s possible to run two or more processes on different CPUs at the same time. This trick is a natural “concurrency” that is easily exploited by systems with more than one low-cost CPU.

Processor speeds have been doubling every eighteen months, but RAM and hard disk speeds have been lagging behind. Doing things in parallel is one way to get around hardware subsystem limitations.

Predictions indicate that processor speeds cannot continue to double every eighteen months after 2005. Indeed, the switch to multi-core has been prompted by an increasingly insurmountable “GHz wall.” Thus, instead of increasing clock speeds at greater and greater expense, chip makers are?offering multiple-core CPUs at lower clock speeds.

Depending on the application, parallel computing can speed things up anywhere from two to N times faster, where N is the number of processors. Such performance is not achievable using a single processor. To wit, most supercomputers that at one time used very fast custom processors are now built from multiple, “commodity-off-the-shelf” CPUs.

3.
2005년은 소프트웨어와 하드웨어에서 매우 중요한 해였다고 합니다.

2005년 이전까지 소프트웨어는 하드웨어의 CPU가 제공하는 Clock Speed에 의존하여 프로그래밍을 했습니다. Herb Sutter는 The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software에서 이와 같은 소프트웨어 개발을 ‘Free Lunch’라고 표현하면 아래와 같이 쓰고 있습니다.

There’s an interesting phenomenon that’s known as “Andy giveth, and Bill taketh away.” No matter how fast processors get, software consistently finds new ways to eat up the extra speed. Make a CPU ten times as fast, and software will usually find ten times as much to do (or, in some cases, will feel at liberty to do it ten times less efficiently). Most classes of applications have enjoyed free and regular performance gains for several decades, even without releasing new versions or doing anything special, because the CPU manufacturers (primarily) and memory and disk manufacturers (secondarily) have reliably enabled ever-newer and ever-faster mainstream systems. Clock speed isn’t the only measure of performance, or even necessarily a good one, but it’s an instructive one: We’re used to seeing 500MHz CPUs give way to 1GHz CPUs give way to 2GHz CPUs, and so on. Today we’re in the 3GHz range on mainstream computers.

CPU의 속도가 빨라진다면 그만큼 같은 작업을 처리하는 속도도 빨라진다는 점을 의미합니다. 소프트웨어 구조가 낡았다고 하더라도 새롭게 컴파일하지 않더라도 최신 CPU의 속도에 의지하여 더 빠른 성능을 낼 수 있습니다. Concurrency기술을 사용한 어플리케이션과 마찬가지로 순차((nonparallel, single-threaded, single-process) 어플리케이션도 비슷한 성능을 낼 수 있었던 이유입니다.

그렇지만 CPU속도가 무어의 법칙처럼 향상되지 않으면서 Free Lunch는 끝났습니다. 새로운 방법으로 CPU성능을 올리려는 노력이 시작되었습니다.

HyperThreading. Multicore. On-Die Cache.

Hyperthreading is about running two or more threads in parallel inside a single CPU. Multicore is about running two or more actual CPUs on one chip. On-die cache sizes can be expected to continue to grow

진정으로 Concurrency을 통하여 어플리케이션의 성능 향상을 이루고자 한다면 Parallel Computing을 환경을 제공하는 CPU기술의 변화에 발맞추어야 한다고 합니다.

다시 처음으로 돌아가서 Concurrency와 Parallelism으로 돌아가보도록 하죠.

양자의 차이를 이렇게 한마디로 표현합니다.

Concurrenty is a property of the prgoram and parallel execution is a property of the machine.

이를 풀어설명하면 아래와 같습니다.

concurrency부터 이야기를 하면, 어떤 프로그램이나 알고리즘이 순서에 상관없이 동시에 수행될 수 있다면 concurrent하다고 말한다. 예를 들어, 1부터 100까지 숫자를 더하는 과정을 생각해보면 숫자 100개를 여러 부분 집합으로 나눈 뒤 동시에 부분합을 구한다. 그리고 이 부분합을 다시 더하면 원래 얻고자 하는 값을 얻을 수 있다. 이때 이 알고리즘은 concurrent하다라고 말한다. 그런데 이 알고리즘이 정말 물리적으로 병렬로 돌아갈지 아닐지는 이 알고리즘이 어떤 하드웨어 위에서 돌아갈지 알아야만 확답할 수 있다. 방금 이야기한 알고리즘이 멀티 프로세서 머신에서 돌아가야 병렬 실행된다라고 말한다. Parallel execution은 따라서 프로그램의 성질보다는 하드웨어의 성질이다.

Concurrent한 프로그램은 싱글코어 머신에서도 분명 돌아간다. 뮤텍스, 데드락은 싱글코어에서도 얼마든지 그 의미를 갖는다. 멀티스레드 프로그램이 비록 물리적인 제약으로 싱글코어에서 시분할 형태로 돌아가지만 겉으로는 concurrent하게 작동한다고 속일 수 있다. 반대로 아무리 멀티스레드로 작성된 프로그램이라 하더라도 멀티코어가 아니면 병렬로 작동한다고 말하지 못한다.
Concurrent와 Parallel의 차이“중에서

제가 찾아본 글중에서 한글로 된 가장 잘 정리한 글입니다.

그러면 어떻게 시작할까요 Parallel Computing과 관련한 여러가지 Tool들이 있습니다.

MPI, OpenMP, CUDA, OpenCL

이런 Tool을 어떻게 사용하면 좋을까요? Linux Magazine에서 실린 The HPC Software Conundrum에서 정리한 표를 참고로 올립니다.

chart 또한 좀더 지속적으로 학습하고 싶다고 하면 Dr.Dobb’s Journal을 참고로 하셨으면 합니다.