두 개의 Numpy 어레이를 동시에 섞는 더 좋은 방법

source

두 개의 Numpy 어레이를 동시에 섞는 더 좋은 방법

gigabyte 2022. 9. 6. 22:20

두 개의 Numpy 어레이를 동시에 섞는 더 좋은 방법

모양은 다르지만 길이가 같은 두 개의 Numpy 배열(선행 치수)이 있습니다.대응하는 요소들이 계속 대응하도록, 즉 선행 지수에 대해 일제히 셔플을 실시하도록, 각각의 요소를 셔플 하고 싶다.

이 코드는 기능하며 목표를 나타냅니다.

def shuffle_in_unison(a, b):
    assert len(a) == len(b)
    shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
    shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
    permutation = numpy.random.permutation(len(a))
    for old_index, new_index in enumerate(permutation):
        shuffled_a[new_index] = a[old_index]
        shuffled_b[new_index] = b[old_index]
    return shuffled_a, shuffled_b

예를 들어 다음과 같습니다.

>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
       [1, 1],
       [3, 3]]), array([2, 1, 3]))

다만, 이것은 투박하고 비효율적이며 느리다고 느껴지기 때문에, 어레이의 카피를 작성할 필요가 있습니다.어레이는 꽤 크기 때문에, 셔플 하는 편이 좋습니다.

더 좋은 방법이 있을까요?빠른 실행과 메모리 사용률 절감이 저의 주된 목표이지만, 우아한 코드도 좋습니다.

한 가지 다른 생각은 다음과 같다.

def shuffle_in_unison_scary(a, b):
    rng_state = numpy.random.get_state()
    numpy.random.shuffle(a)
    numpy.random.set_state(rng_state)
    numpy.random.shuffle(b)

이 방법은...하지만 조금 무섭습니다. 계속 작동한다는 보장은 거의 없습니다. 예를 들어 numpy 버전에서도 살아남을 수 있을 것 같지는 않습니다.

NumPy의 어레이 인덱스를 사용할 수 있습니다.

def unison_shuffled_copies(a, b):
    assert len(a) == len(b)
    p = numpy.random.permutation(len(a))
    return a[p], b[p]

이것에 의해, 개별의 통합 배열이 작성됩니다.

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y, random_state=0)

상세한 것에 대하여는, http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html 를 참조해 주세요.

당신의 "무서운" 해결책은 제게 무섭게 보이지 않습니다. " "shuffle()길이가 같은 2개의 시퀀스에 대해서는, 랜덤 번호 제너레이터에의 콜수가 같게 됩니다.이것들은 셔플 알고리즘의 유일한 「유일한」요소입니다. 하는 으로, 이, 「」, 「」, 「」, 「」, 「」, 「」, 「」에의 의 콜에서도 수 있게 .shuffle()전체 알고리즘이 동일한 치환을 생성합니다.

이것이 마음에 들지 않는 경우는, 데이터를 2개의 어레이가 아닌 1개의 어레이에 보존해, 현재의 2개의 어레이를 시뮬레이트 한 이 단일 어레이에 2개의 뷰를 작성하는 방법이 있습니다.단일 어레이를 shuffling에 사용하고 보기를 다른 모든 용도로 사용할 수 있습니다.

배열: 배열이라고 .a ★★★★★★★★★★★★★★★★★」b음음음같 뭇매하다

a = numpy.array([[[  0.,   1.,   2.],
                  [  3.,   4.,   5.]],

                 [[  6.,   7.,   8.],
                  [  9.,  10.,  11.]],

                 [[ 12.,  13.,  14.],
                  [ 15.,  16.,  17.]]])

b = numpy.array([[ 0.,  1.],
                 [ 2.,  3.],
                 [ 4.,  5.]])

이제 모든 데이터를 포함하는 단일 어레이를 구성할 수 있습니다.

c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[  0.,   1.,   2.,   3.,   4.,   5.,   0.,   1.],
#        [  6.,   7.,   8.,   9.,  10.,  11.,   2.,   3.],
#        [ 12.,  13.,  14.,  15.,  16.,  17.,   4.,   5.]])

을 본뜬 뷰를 .a ★★★★★★★★★★★★★★★★★」b:

a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)

의 a2 ★★★★★★★★★★★★★★★★★」b2됩니다.c를 동시에 , 「어레이의 셔플」을 numpy.random.shuffle(c).

코드에서는 .a ★★★★★★★★★★★★★★★★★」b전혀, 그리고 즉시 작성하다c,a2 ★★★★★★★★★★★★★★★★★」b2.

이 솔루션은 다음과 같은 경우에 적용할 수 있습니다.a ★★★★★★★★★★★★★★★★★」b타 d d

매우 심플한 솔루션:

randomize = np.arange(len(x))
np.random.shuffle(randomize)
x = x[randomize]
y = y[randomize]

2개의 배열 x, y는 모두 같은 방법으로 랜덤하게 섞입니다.

James는 2015년에 도움이 되는 sklearn 솔루션을 작성했습니다.그러나 그는 랜덤 상태 변수를 추가했는데, 이는 필요하지 않습니다.아래 코드에서는 numpy로부터의 랜덤 상태가 자동적으로 상정됩니다.

X = np.array([[1., 0.], [2., 1.], [0., 0.]])
y = np.array([0, 1, 2])
from sklearn.utils import shuffle
X, y = shuffle(X, y)

from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data #numpy array
y = iris.target #numpy array

# Data is currently unshuffled; we should shuffle 
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]

NumPy만을 사용하여 임의의 수의 어레이를 일괄적으로 섞습니다.

import numpy as np


def shuffle_arrays(arrays, set_seed=-1):
    """Shuffles arrays in-place, in the same order, along axis=0

    Parameters:
    -----------
    arrays : List of NumPy arrays.
    set_seed : Seed value if int >= 0, else seed is random.
    """
    assert all(len(arr) == len(arrays[0]) for arr in arrays)
    seed = np.random.randint(0, 2**(32 - 1) - 1) if set_seed < 0 else set_seed

    for arr in arrays:
        rstate = np.random.RandomState(seed)
        rstate.shuffle(arr)

그리고 이렇게 사용할 수 있습니다.

a = np.array([1, 2, 3, 4, 5])
b = np.array([10,20,30,40,50])
c = np.array([[1,10,11], [2,20,22], [3,30,33], [4,40,44], [5,50,55]])

shuffle_arrays([a, b, c])

주의사항:

아사트를 사용하면 모든 입력 배열이 첫 번째 치수를 따라 동일한 길이를 가질 수 있습니다.
어레이가 1차원으로 교체되어 아무것도 반환되지 않았습니다.
양의 int32 범위 내의 랜덤 시드.
반복 가능한 셔플이 필요한 경우 시드 값을 설정할 수 있습니다.

할 수 .np.split또는 슬라이스를 사용하여 참조할 수 있습니다(어플리케이션에 따라 다릅니다.

다음과 같은 어레이를 만들 수 있습니다.

s = np.arange(0, len(a), 1)

그럼 섞어주세요.

np.random.shuffle(s)

이제 이 s를 배열의 인수로 사용합니다. 같은 혼합된 인수는 동일한 혼합된 벡터를 반환합니다.

x_data = x_data[s]
x_label = x_label[s]

이를 처리할 수 있는 잘 알려진 함수가 있습니다.

from sklearn.model_selection import train_test_split
X, _, Y, _ = train_test_split(X,Y, test_size=0.0)

data.test_size가 됩니다.일반적으로 데이터를 분할하고 테스트하는 데 사용되지만 혼합하기도 합니다.
매뉴얼에서

어레이 또는 매트릭스를 랜덤 트레인 및 테스트 서브셋으로 분할

입력 검증과 다음(ShuffleSplit().Split(X, y)) 및 응용 프로그램을 랩하여 데이터를 단일 호출로 입력하여 Oneliner에서 데이터를 분할(및 옵션으로 서브샘플링)하는 빠른 유틸리티입니다.

이것은 매우 간단한 해결책으로 보입니다.

import numpy as np
def shuffle_in_unison(a,b):

    assert len(a)==len(b)
    c = np.arange(len(a))
    np.random.shuffle(c)

    return a[c],b[c]

a =  np.asarray([[1, 1], [2, 2], [3, 3]])
b =  np.asarray([11, 22, 33])

shuffle_in_unison(a,b)
Out[94]: 
(array([[3, 3],
        [2, 2],
        [1, 1]]),
 array([33, 22, 11]))

연결된 목록에 대해 인플레이스 셔플링을 수행할 수 있는 한 가지 방법은 시드(랜덤일 수 있음)와 numpy.random을 사용하는 것입니다.섞어서 섞다

# Set seed to a random number if you want the shuffling to be non-deterministic.
def shuffle(a, b, seed):
   np.random.seed(seed)
   np.random.shuffle(a)
   np.random.seed(seed)
   np.random.shuffle(b)

바로 그겁니다.이것은 a와 b를 똑같이 섞는다.이 또한 인플레이스 방식으로 수행되며, 이는 항상 플러스가 됩니다.

편집, np.random.seed()는 np.random을 사용하지 마십시오.대신 랜덤 스테이트

def shuffle(a, b, seed):
   rand_state = np.random.RandomState(seed)
   rand_state.shuffle(a)
   rand_state.seed(seed)
   rand_state.shuffle(b)

호출할 때는 임의의 시드로 통과하여 랜덤 상태를 공급합니다.

a = [1,2,3,4]
b = [11, 22, 33, 44]
shuffle(a, b, 12345)

출력:

>>> a
[1, 4, 2, 3]
>>> b
[11, 44, 22, 33]

편집: 랜덤 상태를 다시 시드하는 고정 코드

2개의 어레이가 있다고 합시다.a와 b

a = np.array([[1,2,3],[4,5,6],[7,8,9]])
b = np.array([[9,1,1],[6,6,6],[4,2,0]])

우리는 첫 번째 차원을 바꿈으로써 먼저 행 인덱스를 얻을 수 있다.

indices = np.random.permutation(a.shape[0])
[1 2 0]

그런 다음 고급 인덱싱을 사용합니다.여기서는 동일한 인덱스를 사용하여 두 어레이를 동시에 섞습니다.

a_shuffled = a[indices[:,np.newaxis], np.arange(a.shape[1])]
b_shuffled = b[indices[:,np.newaxis], np.arange(b.shape[1])]

이것은 에 상당합니다.

np.take(a, indices, axis=0)
[[4 5 6]
 [7 8 9]
 [1 2 3]]

np.take(b, indices, axis=0)
[[6 6 6]
 [4 2 0]
 [9 1 1]]

어레이 복사를 피하고 싶다면 배열 목록을 생성하는 대신 어레이 내의 모든 요소를 살펴보고 어레이 내의 다른 위치로 랜덤하게 스왑하는 것이 좋습니다.

for old_index in len(a):
    new_index = numpy.random.randint(old_index+1)
    a[old_index], a[new_index] = a[new_index], a[old_index]
    b[old_index], b[new_index] = b[new_index], b[old_index]

이렇게 하면 Knuth-Fisher-Yates 셔플 알고리즘이 구현됩니다.

내 생각에 가장 짧고 쉬운 방법은 씨앗을 사용하는 것이다.

random.seed(seed)
random.shuffle(x_data)
# reset the same seed to get the identical random sequence and shuffle the y
random.seed(seed)
random.shuffle(y_data)

예를 들어 다음과 같이 하고 있습니다.

combo = []
for i in range(60000):
    combo.append((images[i], labels[i]))

shuffle(combo)

im = []
lab = []
for c in combo:
    im.append(c[0])
    lab.append(c[1])
images = np.asarray(im)
labels = np.asarray(lab)

난 비단뱀의 랜덤을 확장했어.shuffle()을 사용하여 두 번째 arg를 가져옵니다.

def shuffle_together(x, y):
    assert len(x) == len(y)

    for i in reversed(xrange(1, len(x))):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random.random() * (i+1))
        x[i], x[j] = x[j], x[i]
        y[i], y[j] = y[j], y[i]

이렇게 하면 셔플이 인스톨 되고, 기능이 너무 길거나 복잡하지 않은 것을 확인할 수 있습니다.

그냥 사용하다numpy...

먼저 2개의 입력 어레이 1D 어레이는 라벨(y)이고 2D 어레이는 데이터(x)이며 NumPy로 혼합합니다.shuffle방법.결국 그들을 갈라놓고 돌아옵니다.

import numpy as np

def shuffle_2d(a, b):
    rows= a.shape[0]
    if b.shape != (rows,1):
        b = b.reshape((rows,1))
    S = np.hstack((b,a))
    np.random.shuffle(S)
    b, a  = S[:,0], S[:,1:]
    return a,b

features, samples = 2, 5
x, y = np.random.random((samples, features)), np.arange(samples)
x, y = shuffle_2d(train, test)

위의 대부분의 솔루션은 작동하지만 열 벡터가 있는 경우 먼저 해당 벡터를 전치해야 합니다.여기 예가 있다

def shuffle(self) -> None:
    """
    Shuffles X and Y
    """
    x = self.X.T
    y = self.Y.T
    p = np.random.permutation(len(x))
    self.X = x[p].T
    self.Y = y[p].T

언급URL : https://stackoverflow.com/questions/4601373/better-way-to-shuffle-two-numpy-arrays-in-unison

'source' 카테고리의 다른 글

문자열을 분할하여 특정 문자를 구분하려면 어떻게 해야 하나요? (0)	2022.09.06
URL 인코딩에서는 "&"(앰퍼샌드)를 "&" HTML 엔티티로 간주합니다. (0)	2022.09.06
예기치 않은 토큰 내보내기 가져오기 (0)	2022.09.06
PHP 메서드 체인 또는 유창한 인터페이스 (0)	2022.09.06
PECL과 PEAR의 차이점은 무엇입니까? (0)	2022.09.06

현재글두 개의 Numpy 어레이를 동시에 섞는 더 좋은 방법

각종 프로그래밍 정보를 다루는 블로그입니다.

C, mariaDB, WordPress, Java, Ajax, WPF, PHP, reactjs, JavaScript, Excel, Python, MySQL, SWiFT, spring-boot, oracle, vuejs2, sql-server, vuex, JSON, AngularJS,

Today :
Yesterday :

gigabyte

두 개의 Numpy 어레이를 동시에 섞는 더 좋은 방법

두 개의 Numpy 어레이를 동시에 섞는 더 좋은 방법

편집, np.random.seed()는 np.random을 사용하지 마십시오.대신 랜덤 스테이트

'source' 카테고리의 다른 글

'source'의 다른글

티스토리툴바

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

두 개의 Numpy 어레이를 동시에 섞는 더 좋은 방법

두 개의 Numpy 어레이를 동시에 섞는 더 좋은 방법

편집, np.random.seed()는 np.random을 사용하지 마십시오.대신 랜덤 스테이트

'source' 카테고리의 다른 글

'source'의 다른글

관련글

티스토리툴바