在pytorch中使用线性代数处理数据

source code: NJU-ymhui/DataOperations: Use pytorch for data operations (github.com)

use git to clone: https://github.com/NJU-ymhui/DataOperations.git

linear_algebra.py tensor_operation.py

基本概念

轴

**张量的每一个轴对应数据的一个维度。**轴的编号从0开始；轴的编号指明了张量的某个方向。

示例：一个二维张量可以视作一个矩阵，它有两个轴：行(轴-0)与列(轴-1)

维度

此处的维度描述的是张量这个宏观概念。

维度是张量的一个属性，指的是张量的数据结构在各个方向上的扩展性。例如，一个二维张量（矩阵）有两个维度，三维张量有三个维度，以此类推。维度可以理解为张量的形状的一部分，描述了张量在每个方向上的大小。例如，形状为 (2, 3, 4) 的三维张量有 3 个维度。

注意区分轴与维度的概念：维度是描述张量形状的，而轴是进行张量操作时用来指定方向的。可以把维度看作是描述张量结构的属性，而轴是对这些属性进行操作的工具。

形状

**张量的形状是一个表示每个轴大小的元组。**例如，一个形状为 (3, 4) 的二维张量有 3 行和 4 列。

标量

在pytorch中，标量由只有一个元素的张量表示，当然也可以进行代数运算。

code

def scalar():
    x = torch.tensor(3)  # 生成一个标量3
    y = torch.tensor(2.5)  # 生成一个标量2.5
    z = torch.tensor(1.)
    print(x, y, z)
    # calculate
    print(x + y + z, x * y / z, x ** y, x % z)
    
    # differ
    x = torch.tensor([3])  # 注：生成的不是标量，而是一个长度为 1 的一维张量
    print(x)

output

tensor(3) tensor(2.5000) tensor(1.)
tensor(6.5000) tensor(7.5000) tensor(15.5885) tensor(0.)
tensor([3])

注：x = torch.tensor(3)与x = torch.tensor([3])表达不同的语义，前者生成的是一个标量（虽然是以张量的形式表示），而后者生成的的的确确是一个只有一个元素的一维张量

向量

向量可以被视作由标量组成的列表，在pytorch中，向量由一维张量表示。

和原生python一样可以用下标索引访问向量中的元素（在数据操作一文中有介绍）

长度、维度

pytorch中的张量提供了size()和shape属性来获取向量(一维张量)的长度、维度（因为向量一定只有一个

code

def vector():
    vec = torch.arange(4)  # 生成一个长度为 4 的一维张量表示向量，arange 函数生成一个从 0 到 3 的序列
    print(vec)
    # visit element in vec
    print(vec[0], vec[1], vec[2], vec[3])
    # change element in vec
    vec[0] = 1
    print(vec[0])
    print(vec)
    # show the length, dim of vec
    print("size:", vec.size(), "shape:", vec.shape)  # 因为向量vec是一维张量，所以它的shape就是这个向量的维数
    # use len()
    print("size by len():", len(vec))

output

tensor([0, 1, 2, 3])
tensor(0) tensor(1) tensor(2) tensor(3)
tensor(1)
tensor([1, 1, 2, 3])
size: torch.Size([4]) shape: torch.Size([4])
size by len(): 4

矩阵

正如向量是标量的推广，矩阵是向量的推广，可以看作是向量的组合。

通过reshape()方法可以将一个一维张量重塑成多维张量，自然可以形成矩阵。

code

def matrix():
    mat = torch.arange(20).reshape(5, 4)  # 生成一个普通矩阵5行4列，0-19
    print(mat)
    # 通过matrix的T属性访问其转置矩阵
    print("transpose of matrix:")
    print(mat.T)

output

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])
tensor([[ 0,  4,  8, 12, 16],
        [ 1,  5,  9, 13, 17],
        [ 2,  6, 10, 14, 18],
        [ 3,  7, 11, 15, 19]])

张量

现在我们要正式介绍张量的概念了。正如向量是标量的推广，矩阵是向量的推广，张量可以看作是对矩阵的推广，将数据送上更高的维度，拥有更多的轴。

code

def tensor():
    print("3 * 4 * 5:")
    tens = torch.arange(60).reshape(3, 4, 5)  # 生成一个三维张量，4行5列的矩阵有3个
    print(tens)
    print("2 x 3 x 4 x 5:")
    tens2 = torch.arange(120).reshape(2, 3, 4, 5)  # 生成一个四维张量，4行5列的矩阵有3个, 这样的组合还有两个
    print(tens2)

output

3 * 4 * 5:
tensor([[[ 0,  1,  2,  3,  4],
         [ 5,  6,  7,  8,  9],
         [10, 11, 12, 13, 14],
         [15, 16, 17, 18, 19]],

        [[20, 21, 22, 23, 24],
         [25, 26, 27, 28, 29],
         [30, 31, 32, 33, 34],
         [35, 36, 37, 38, 39]],

        [[40, 41, 42, 43, 44],
         [45, 46, 47, 48, 49],
         [50, 51, 52, 53, 54],
         [55, 56, 57, 58, 59]]])
2 x 3 x 4 x 5:
tensor([[[[  0,   1,   2,   3,   4],
          [  5,   6,   7,   8,   9],
          [ 10,  11,  12,  13,  14],
          [ 15,  16,  17,  18,  19]],

         [[ 20,  21,  22,  23,  24],
          [ 25,  26,  27,  28,  29],
          [ 30,  31,  32,  33,  34],
          [ 35,  36,  37,  38,  39]],

         [[ 40,  41,  42,  43,  44],
          [ 45,  46,  47,  48,  49],
          [ 50,  51,  52,  53,  54],
          [ 55,  56,  57,  58,  59]]],


        [[[ 60,  61,  62,  63,  64],
          [ 65,  66,  67,  68,  69],
          [ 70,  71,  72,  73,  74],
          [ 75,  76,  77,  78,  79]],

         [[ 80,  81,  82,  83,  84],
          [ 85,  86,  87,  88,  89],
          [ 90,  91,  92,  93,  94],
          [ 95,  96,  97,  98,  99]],

         [[100, 101, 102, 103, 104],
          [105, 106, 107, 108, 109],
          [110, 111, 112, 113, 114],
          [115, 116, 117, 118, 119]]]])

pytorch中张量算法的性质

clone()方法

张量的clone()方法会分配一个新内存，并完全拷贝被克隆张量的内容

code

def clone():
    a = torch.arange(20).reshape(4, 5)
    b = a.clone()
    print("memory same?")
    print(id(a) == id(b))
    print("context same?")
    print(a == b)
    print("context:")
    print("a:")
    print(a)
    print("a + a.clone():")
    print(a + b)

output

memory same?
False
context same?
tensor([[True, True, True, True, True],
        [True, True, True, True, True],
        [True, True, True, True, True],
        [True, True, True, True, True]])
context:
a:
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])
a + a.clone():
tensor([[ 0,  2,  4,  6,  8],
        [10, 12, 14, 16, 18],
        [20, 22, 24, 26, 28],
        [30, 32, 34, 36, 38]])

张量乘

张量的乘法*是按元素乘（数据操作里有讲）

code

def multiply():
    a = torch.arange(4).reshape(2, 2)
    b = torch.arange(4).reshape(2, 2)
    print(a)
    print(b)
    print(a * b)

output

tensor([[0, 1],
        [2, 3]])
tensor([[0, 1],
        [2, 3]])
tensor([[0, 1],
        [4, 9]])

与标量运算

一个张量与一个标量运算，效果为张量的每个元素都与标量做一次运算

code

def tensor_with_scalar():
    mat = torch.arange(6).reshape(2, 3)  # 生成一个2 * 3的矩阵
    sca = torch.tensor(2)  # 生成一个标量2
    print(mat)
    print(sca)
    print(mat * sca)
    print(mat + sca)

output

tensor([[0, 1, 2],
        [3, 4, 5]])
tensor(2)
tensor([[ 0,  2,  4],
        [ 6,  8, 10]])
tensor([[2, 3, 4],
        [5, 6, 7]])

降维

求和

可以通过var.sum(...)方法求张量中所有元素的和

默认情况下求和会把张量中所有元素都加在一起，使其退化为一个标量，但也可以指定张量沿哪一个轴来求和降维，通过axis参数。

code

def sum_reduce_dim():
    mat = torch.arange(20).reshape(4, 5)
    vec = torch.arange(4)
    print(mat)
    print(vec)
    # 默认求所有元素和，求完后变成标量
    print(mat.sum())
    print(vec.sum())
    # 指定轴降维求和
    # 以矩阵为例
    print(mat.sum(axis=0))  # 沿行（轴-0）求和，即对于第i列，将它的每一行元素求和，即求每列的和，求完后变成向量
    print(mat.sum(axis=1))  # 沿列（轴-1）求和，即对于第i行，将它的每一列元素求和，即求每行的和
    print(mat.sum(axis=[0, 1]))  # 沿所有轴降维，即求全体和

output

tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])
tensor([0, 1, 2, 3])
tensor(190)
tensor(6)
tensor([30, 34, 38, 42, 46])
tensor([10, 35, 60, 85])
tensor(190)

求平均

可以通过var.mean()方法求所有元素的平均值，与sum同理，也可以指定沿哪个轴求均值。

注：求均值的张量元素期望为浮点或复数类型

code

def mean():
    mat = torch.arange(20, dtype=torch.float32).reshape(4, 5)
    print(mat)
    print(mat.mean())  # 求完后变成标量
    print(mat.sum() / mat.numel())
    # 指定轴
    print("mean of column:")
    print(mat.mean(axis=0))  # 求每列的平均值，求完后变成向量
    print(mat.sum(axis=0) / mat.shape[0])
    print("mean of row:")
    print(mat.mean(axis=1))  # 求每行的平均值
    print(mat.sum(axis=1) / mat.shape[1])

output

tensor([[ 0.,  1.,  2.,  3.,  4.],
        [ 5.,  6.,  7.,  8.,  9.],
        [10., 11., 12., 13., 14.],
        [15., 16., 17., 18., 19.]])
tensor(9.5000)
tensor(9.5000)
mean of column:
tensor([ 7.5000,  8.5000,  9.5000, 10.5000, 11.5000])
tensor([ 7.5000,  8.5000,  9.5000, 10.5000, 11.5000])
mean of row:
tensor([ 2.,  7., 12., 17.])
tensor([ 2.,  7., 12., 17.])

非降维求和

有时我们希望求完和或均值后的张量保持轴数，可以使用keepdims参数来实现。通过这种方法得到的按某个轴求得的和或均值轴数不变，就可以利用广播机制与原张量进行运算。

如果我们想沿某个轴计算张量中元素的累积总和，比如axis=0（按行计算），可以调用cumsum()函数。此函数不会沿任何轴降低输入张量的维度。

code

def keep_dims():
    mat = torch.arange(20).reshape(4, 5)
    print("mat:")
    print(mat)
    print("sum of column:")
    print(mat.sum(axis=0, keepdims=True))  # 依然求每列的和，但求完和后仍体现为一个二维张量
    print("sum of row:")
    print(mat.sum(axis=1, keepdims=True))
    # 结果仍是两个轴，可以与原张量运算
    tmp = mat.sum(axis=1, keepdims=True)  # tmp为每行求和结果
    print("tmp:")
    print(tmp)
    # 利用广播机制求原矩阵除以按行求和的结果
    print("broadcast for mat / tmp:")
    print(mat / tmp)
    # 沿行（轴-0）计算矩阵中元素的累计值
    print("cumulative value of mat by row:")
    print(mat.cumsum(axis=0))

output

mat:
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])
sum of column:
tensor([[30, 34, 38, 42, 46]])
sum of row:
tensor([[10],
        [35],
        [60],
        [85]])
tmp:
tensor([[10],
        [35],
        [60],
        [85]])
broadcast for mat / tmp:
tensor([[0.0000, 0.1000, 0.2000, 0.3000, 0.4000],
        [0.1429, 0.1714, 0.2000, 0.2286, 0.2571],
        [0.1667, 0.1833, 0.2000, 0.2167, 0.2333],
        [0.1765, 0.1882, 0.2000, 0.2118, 0.2235]])
cumulative value of mat by row:
tensor([[ 0,  1,  2,  3,  4],
        [ 5,  7,  9, 11, 13],
        [15, 18, 21, 24, 27],
        [30, 34, 38, 42, 46]])

点积

两个向量之间可以进行点积，使用torch.dot(...)方法可以在向量间使用点积；根据点积的定义也可以先对向量做按元素乘法，然后再计算总和。

code

def dot_product():
    a = torch.arange(4)
    b = torch.tensor([1, 2, 5, 9])
    print(a)
    print(b)
    print("dot product:")
    print(torch.dot(a, b))
    print(torch.sum(a * b))
    print((a * b).sum())

output

tensor([0, 1, 2, 3])
tensor([1, 2, 5, 9])
dot product:
tensor(39)
tensor(39)
tensor(39)

矩阵乘法

矩阵-向量积

即用一个矩阵矩阵乘一个向量：A~mn~x~n~，注意矩阵的列（轴-1）要与向量的维数一致

torch.mv(...)：实现矩阵乘向量

code

def matrix_mul_vector():
    mat = torch.arange(20).reshape(5, 4)  # 生成一个5 * 4的矩阵
    vec = torch.arange(4)  # 生成一个4维向量
    print(mat)
    print(mat.shape)
    print(vec)
    print(vec.shape)
    print("mat * vec:")
    res = torch.mv(mat, vec)  # 矩阵乘向量
    print(res)
    print(res.shape)

output

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])
torch.Size([5, 4])
tensor([0, 1, 2, 3])
torch.Size([4])
mat * vec:
tensor([ 14,  38,  62,  86, 110])
torch.Size([5])

矩阵-矩阵积

A~mn~B~nk~，注意左矩阵的列（轴-1）要和右矩阵的行（轴-0）一致

torch.mm(...)：实现矩阵乘矩阵（仅二维张量也就是矩阵）

torch.matmul(...)：实现任意维度张量乘法

code

def matrix_mul_matrix():
    mat1 = torch.arange(20).reshape(5, 4)  # 生成一个5 * 4的矩阵
    mat2 = torch.arange(12).reshape(4, 3)  # 生成一个4 * 3的矩阵
    print(mat1)
    print(mat1.shape)
    print(mat2)
    print(mat2.shape)
    print("mat1 * mat2:")
    res = torch.mm(mat1, mat2)
    print(res)
    print(res.shape)

output

tensor([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15],
        [16, 17, 18, 19]])
torch.Size([5, 4])
tensor([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]])
torch.Size([4, 3])
mat1 * mat2:
tensor([[ 42,  48,  54],
        [114, 136, 158],
        [186, 224, 262],
        [258, 312, 366],
        [330, 400, 470]])
torch.Size([5, 3])

范数

通俗的来讲，范数反映出一个向量的大小。常见的范数计算方法有L1范数和L2范数两种。

L~1~

向量元素的绝对值和 $\sum_{i=1}^{n}|x_i|$

code

def L1():
    vec = torch.tensor([1, -2, 3, -4])
    print(vec)
    print("L1:")
    print(torch.abs(vec).sum())

output

tensor([ 1, -2,  3, -4])
L1:
tensor(10)

L~2~

向量与线性空间原点的距离（有点类似距离公式） $\sqrt(\sum_{i=1}^{n}x_i^2)$ ，深度学习中更常使用L~2~范数

torch.norm(...)：计算L~2~范数，注意张量元素要是浮点或复数类型

code

def L2():
    vec = torch.tensor([1., -2, 3, -4])
    print(vec)
    print("L2:")
    print(torch.norm(vec))
    print(torch.sqrt(torch.pow(vec, 2).sum()))  # 自己套公式

output

tensor([ 1., -2.,  3., -4.])
L2:
tensor(5.4772)
tensor(5.4772)

L~p~范数

$||Vector||_{p}=(\sum_{i=1}^{n}|x_i|^p)^{1/p}$

矩阵的L~2~范数

矩阵的Frobenius范数定义为矩阵所有元素的平方和的平方根。

$||Matrix_{mn}||_F=\sqrt(\sum_{i=1}^{m}\sum_{j=1}^{n}x_{ij}^2)$

code

def matrix_l2():
    mat = torch.arange(12, dtype=torch.float32).reshape(4, 3)
    print(mat)
    print("mat's L2 norm")
    print(torch.norm(mat))
    print(torch.sqrt(torch.pow(mat, 2).sum()))  # 自己套公式

output

tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.],
        [ 6.,  7.,  8.],
        [ 9., 10., 11.]])
mat's L2 norm
tensor(22.4944)
tensor(22.4944)

(•‿•)