Kwangmin Kim - Pytorch Introduction

아직 GPU를 못잡았음 -> Google Colab에서만 돌아감

1 Pytorch Overview

PyTorch는 기계 학습 프레임워크(framework) 중 하나다.
- PyTorch의 텐서(tensor)는 NumPy 배열과 매우 유사하다.
- Tensor flow 보다 사용 비중이 늘어나고 있다.
- Tensor: 고차원 데이터 (배열)를 의미, 3차원 배열 이상
PyTorch를 사용하면, GPU 연동을 통해 효율적으로 딥러닝 모델을 학습할 수 있다.
Google Colab을 이용하면, 손쉽게 PyTorch를 시작할 수 있다.
Google Colab에서는 [런타임] - [런타임 유형 변경]에서 GPU를 선택할 수 있다.
Google Colab에선 pytoch가 내장되어 있기 때문에 따로 설치할 필요 없음

1.1 GPU 사용 여부 체크하기

텐서간의 연산을 수행할 때, 기본적으로 두 텐서가 같은 장치에 있어야 한다.
연산을 수행하는 텐서들을 모두 GPU에 올린 뒤에 연산을 수행한다.
GPU는 고차원 행렬곱같은 병렬 처리 연산에 최적화 되어 있다.

tensor 자체가 고차원 배열이기 때문에 데이터를 불러오면 tensor 형태로 바꿀 수 있다.

1.2 텐서(tensor) 객체 생성 (기본 python 데이터 유형)

import torch

# data initialization: 초기화된 데이터는 gpu에 있음
data = [
  [1, 2],
  [3, 4]
]

x = torch.tensor(data) # list를 tensor 형태로 바꾸기. 
print(x.is_cuda)

x = x.cuda() # CPU -> GPU로 옮기기
print(x.is_cuda)

x = x.cpu() # GPU -> CPU로 옮기기
print(x.is_cuda)

서로 다른 장치(device)에 있는 텐서끼리 연산을 수행하면 오류가 발생한다.

# GPU 장치의 텐서
a = torch.tensor([
    [1, 1],
    [2, 2]
]).cuda()

# CPU 장치의 텐서
b = torch.tensor([
    [5, 6],
    [7, 8]
])

# print(torch.matmul(a, b)) # 오류 발생
print(torch.matmul(a.cpu(), b))

1.3 2. 텐서 소개 및 생성 방법

1.3.1 1) 텐서의 속성

텐서의 기본 속성으로는 다음과 같은 것들이 있다.
- 모양(shape): 텐서 객체의 차원을 확인할 수 있다. (tensor_var.shape)
- 자료형(data type) : 텐서의 기본 자료형은 float type (tensor_var.dtype)
- 저장된 장치: CPU인지 GPU인지 확인 (tensor_var.device)

tensor = torch.rand(3, 4)

print(tensor)
print(f"Shape: {tensor.shape}")
print(f"Data type: {tensor.dtype}")
print(f"Device: {tensor.device}")

1.3.2 2) 텐서 초기화

리스트 데이터에서 직접 텐서를 초기화할 수 있다.


data = [
  [1, 2],
  [3, 4]
]
x = torch.tensor(data)

print(x)

NumPy 배열에서 텐서를 초기화할 수 있다.

a = torch.tensor([5])
b = torch.tensor([7])

c = (a + b).numpy() # tensor -> numpy array
print(c)
print(type(c)) # ndarray: numpy array 

result = c * 10
tensor = torch.from_numpy(result) # numpy array -> tensor 
print(tensor)
print(type(tensor))

1.3.3 3) 다른 텐서로부터 data를 가져와 텐서 초기화하기

다른 텐서의 정보를 토대로 텐서를 초기화할 수 있다.
텐서의 속성은 모양(shape)과 자료형(data type)이 있다

x = torch.tensor([
    [5, 7],
    [1, 2]
])

# x와 같은 shape와 data type을 가지지만, 값이 1인 텐서 생성
x_ones = torch.ones_like(x)
print(x_ones)
# x와 같은 shape를 가지되, 자료형은 float으로 덮어쓰고, 값은 랜덤으로 채우기
x_rand = torch.rand_like(x, dtype=torch.float32) # uniform distribution [0, 1)
print(x_rand)

1.4 3. 텐서의 형변환 및 차원 조작

텐서는 넘파이(NumPy) 배열처럼 조작할 수 있다.

1.4.1 1) 텐서의 특정 차원 접근하기

텐서의 원하는 차원에 접근할 수 있다.

tensor = torch.tensor([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
])

print(tensor[0]) # the first row
print(tensor[:, 0]) # indexing the first column with all the rows
# whatever the previous columns are, indexing the last column with all the rows
print(tensor[..., -1])

1.4.2 2) 텐서 이어붙이기(Concatenate)

두 텐서를 이어 붙여 연결하여 새로운 텐서를 만들 수 있다.


tensor = torch.tensor([
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12]
])

# dim: 텐서를 이어 붙이기 위한 축
# 0번 축(행)을 기준으로 이어 붙이기: 즉, row bind로 연결
result = torch.cat([tensor, tensor, tensor], dim=0) 
print(result)
print(result.shape) # 9x4

# 1번 축(열)을 기준으로 이어 붙이기: 즉, column bind로 연결
result = torch.cat([tensor, tensor, tensor], dim=1)  
print(result)
print(result.shape) # 3x12

1.4.3 3) 텐서 형변환(Type Casting)

텐서의 자료형(정수, 실수 등)을 변환할 수 있다.

a = torch.tensor([2], dtype=torch.int) # integers
b = torch.tensor([5.0]) # real numbers

print(a.dtype)
print(b.dtype)

# 텐서 a는 자동으로 float32형으로 형변환 처리
print(a + b)
# 텐서 b를 int32형으로 형변환하여 덧셈 수행
print(a + b.type(torch.int32))

1.4.4 4) 텐서의 모양 변경

view()는 텐서의 shape를 변경할 때 사용한다.
이때, 텐서(tensor)의 순서는 변경되지 않는다.

# view()는 텐서의 모양을 변경할 때 사용한다.
# 이때, 텐서(tensor)의 순서는 변경되지 않는다.
a = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8])
b = a.view(4, 2) # 4*2=8 개, # call by reference
print(b)

# a의 값을 변경하면 b도 변경: 메모리 주소값을 공유
a[0] = 7
print(b)

# a의 값을 복사(copy)한 뒤에 변경, 새로운 메모리값 할당
c = a.clone().view(4, 2) # call by value, 아예 다른 객체
a[0] = 9
print(c)

1.4.5 5) 텐서의 차원 교환

하나의 텐서에서 특정한 차원끼리 순서를 교체할 수 있다.

a = torch.rand((64, 32, 3))
print(a.shape)

b = a.permute(2, 1, 0) # 차원을 바꿔줌
# (2번째 축, 1번째 축, 0번째 축)의 형태가 되도록 한다.
print(b.shape)

1.5 4. 텐서의 연산과 함수

1.5.1 1) 텐서의 연산

텐서에 대하여 사칙연산 등 기본적인 연산을 수행할 수 있다.

# 같은 크기를 가진 두 개의 텐서에 대하여 사칙연산 가능
# 기본적으로 요소별(element-wise) 연산, 행렬의 연산과 다름
a = torch.tensor([
    [1, 2],
    [3, 4]
])
b = torch.tensor([
    [5, 6],
    [7, 8]
])
print(a + b)
print(a - b)
print(a * b)
print(a / b)

행렬 곱을 수행할 수 있다.


a = torch.tensor([
    [1, 2],
    [3, 4]
])
b = torch.tensor([
    [5, 6],
    [7, 8]
])
# 행렬 곱(matrix multiplication) 수행
print(a.matmul(b))
print(torch.matmul(a, b))

1.5.2 2) 텐서의 평균 함수

텐서의 평균(mean)을 계산할 수 있다.

a = torch.Tensor([
    [1, 2, 3, 4],
    [5, 6, 7, 8]
])
print(a)
print(a.mean()) # 전체 원소에 대한 평균
print(a.mean(dim=0)) # 각 열에 대하여 평균 계산
print(a.mean(dim=1)) # 각 행에 대하여 평균 계산

1.5.3 3) 텐서의 합계 함수

텐서의 합계(sum)를 계산할 수 있다.


a = torch.Tensor([
    [1, 2, 3, 4],
    [5, 6, 7, 8]
])
print(a)
print(a.sum()) # 전체 원소에 대한 합계
print(a.sum(dim=0)) # 각 열에 대하여 합계 계산
print(a.sum(dim=1)) # 각 행에 대하여 합계 계산

1.5.4 4) 텐서의 최대 함수

max() 함수는 원소의 최댓값을 반환한다.
argmax() 함수는 가장 큰 원소(최댓값)의 인덱스를 반환한다.

a = torch.Tensor([
    [1, 2, 3, 4],
    [5, 6, 7, 8]
])
print(a)
print(a.max()) # 전체 원소에 대한 최댓값
print(a.max(dim=0)) # 각 열에 대하여 최댓값 계산
print(a.max(dim=1)) # 각 행에 대하여 최댓값 계산

a = torch.Tensor([
    [1, 2, 3, 4],
    [5, 6, 7, 8]
])
print(a)
print(a.argmax()) # 전체 원소에 대한 최댓값의 인덱스
print(a.argmax(dim=0)) # 각 열에 대하여 최댓값의 인덱스 계산
print(a.argmax(dim=1)) # 각 행에 대하여 최댓값의 인덱스 계산

1.5.5 5) 텐서의 차원 줄이기 혹은 늘리기

unsqueeze() 함수는 크기가 1인 차원을 추가한다.
- 배치(batch) 차원을 추가하기 위한 목적으로 흔히 사용된다.
squeeze() 함수는 크기가 1인 차원을 제거한다.

a = torch.Tensor([
    [1, 2, 3, 4],
    [5, 6, 7, 8]
])
print(a.shape)

# 첫 번째 축에 차원 추가
a = a.unsqueeze(0)
print(a)
print(a.shape)

# 네 번째 축에 차원 추가
a = a.unsqueeze(3)
print(a)
print(a.shape)

# 크기가 1인 차원 제거
a = a.squeeze()
print(a)
print(a.shape)

1.6 5. 자동 미분과 기울기(Gradient)

PyTorch에서는 연산에 대하여 자동 미분을 수행할 수 있다.
각 텐서 변수에 대해 gradient추적이 가능하여 텐서 연산에 각 텐서 변수의 기울기(민감도)를 추적할 수 있다.

import torch

# requires_grad를 설정할 때만 기울기 추적
x = torch.tensor([3.0, 4.0], requires_grad=True)
y = torch.tensor([1.0, 2.0], requires_grad=True)
z = x + y #z를 연산하는데 x와 y의 민감도를 추적할 수 있다.
# x or y의 민감도 즉 gradient가 크다는 것은 변수의 값이 조금만 바뀌어도 z값에 큰 영향을 미친다는것을 의미 

print(z) # [4.0, 6.0]
print(z.grad_fn) # 더하기(add), 
# AddBackward: 기울기를 구하는 과정에서 Add를 사용한다. 뭔뜻인지? ㅋ
# Add를 연산하는 과정에서 기울기를 구하는거 아님?

out = z.mean()
print(out) # 5.0
print(out.grad_fn) # 평균(mean)

out.backward() # scalar에 대하여 모든 연산의 기울기를 추적 가능
print(x.grad) # tensor([0.5000, 0.5000]), 0.5: x의 값이 1만큼 바뀔 때 output값이 0.5만큼 바뀐다는것을 의미
print(y.grad) # tensor([0.5000, 0.5000]),
print(z.grad) # leaf variable에 대해서만 gradient 추적이 가능하다. 따라서 None.

일반적으로 모델을 학습할 때는 기울기(gradient)를 추적한다.
- 왜냐면, 가중치를 기울기에 따라 업데이트 해야하기 때문.
하지만, 학습된 모델을 사용할 때는 파라미터를 업데이트하지 않으므로, 기울기를 추적하지 않는 것이 일반적이다.

temp = torch.tensor([3.0, 4.0], requires_grad=True)# tape,라 부름. 왜?
print(temp.requires_grad)
print((temp ** 2).requires_grad)

# 기울기 추적을 하지 않기 때문에 계산 속도가 더 빠르다.
with torch.no_grad():
    temp = torch.tensor([3.0, 4.0], requires_grad=True)
    print(temp.requires_grad)
    print((temp ** 2).requires_grad)