음성 데이터 잘라보기

2019. 3. 19. 13:59

728x90

음성 데이터(Speech, Sound 등)를 python을 이용하여 처리하여(sampling rate나 일정 부분을 자는 것 등) 저장하는 것을 해보려 한다.

사용하는 라이브러리는 librosa

우선 음성을 저장된 디렉토리로부터 불러온다.

아래의 코드를 사용하여 불러오면 file_dir와 file_id가 출력이 된다.

wav = '/data/dataset/IEMOCAP_wavonly/IEMOCAP_Wavonly/Wav/Ses01F_impro01/Ses01F_impro01_F005.wav'
(file_dir, file_id) = os.path.split(wav)
print("file_dir:", file_dir)
print("file_id:", file_id)

- 결과

file_dir: /data/dataset/IEMOCAP_wavonly/IEMOCAP_Wavonly/Wav/Ses01F_impro01

file_id: Ses01F_impro01_F005.wav

Process finished with exit code 0

그 다음 이 음성 파일을 librosa로 불러올 것이다.

librosa는 별도로 sr(sampling rate)를 설정하지 않으면 default sr이 22500으로 되어있다.

내가 불러올 음성 파일은 이미 16000 sr이므로 반드시 sr=16000을 붙여주도록 한다.

y, sr = librosa.load(wav, sr=16000)
time = np.linspace(0, len(y)/sr, len(y)) # time axis

fig, ax1 = plt.subplots() # plot
ax1.plot(time, y, color = 'b', label='speech waveform')
ax1.set_ylabel("Amplitude") # y 축
ax1.set_xlabel("Time [s]") # x 축
plt.title(file_id) # 제목
plt.savefig(file_id+'.png')
plt.show()

이 음성 파일을 plot하여 그린 그림은 아래와 같다.

약 4초정도의 길이를 갖는 음성파일이다.

이 음성 파일의 절반만을 가져오고 다시 저장하는 단계를 해보겠다.

우선 아래의 코드를 통해 음성 신호를 절반만을 가져오고, 절반을 그려보도록 하자.

half = len(y)/2
y2 = y[:round(half)]
time2 = np.linspace(0, len(y2)/sr, len(y2))
fig2, ax2 = plt.subplots()
ax2.plot(time2, y2, color = 'b', label='speech waveform')
ax1.set_ylabel("Amplitude") # y 축
ax1.set_xlabel("Time [s]") # x 축
plt.title('cut '+file_id)
plt.savefig('cut_half '+file_id+'.png')
plt.show()

이 부분의 결과는 아래의 그림과 같다. 약 절반정도가 잘렸다.

이제 이 음성을 저장할건데, librosa 라이브러리를 사용하여 쉽게 저장할 수 있다.

librosa.output.write_wav('cut_file.wav', y2, sr)

아래에서 기존 음성과 자른 부분의 음성을 비교해볼 수 있다.

Original 음성:

original_file.mp3

절반 Cut 한 음성:

cut_file.mp3

전체 코드

import librosa
import os
import numpy as np
import matplotlib.pyplot as plt


wav = '/data/dataset/IEMOCAP_wavonly/IEMOCAP_Wavonly/Wav/Ses01F_impro01/Ses01F_impro01_F005.wav'
(file_dir, file_id) = os.path.split(wav)
print("file_dir:", file_dir)
print("file_id:", file_id)

# original
y, sr = librosa.load(wav, sr=16000)
time = np.linspace(0, len(y)/sr, len(y)) # time axis
fig, ax1 = plt.subplots() # plot
ax1.plot(time, y, color = 'b', label='speech waveform')
ax1.set_ylabel("Amplitude") # y 축
ax1.set_xlabel("Time [s]") # x 축
plt.title(file_id) # 제목
plt.savefig(file_id+'.png')
plt.show()
librosa.output.write_wav('original_file.mp3', y, sr) # original wav to save mp3 file

# cut half and save
half = len(y)/2
y2 = y[:round(half)]
time2 = np.linspace(0, len(y2)/sr, len(y2))
fig2, ax2 = plt.subplots()
ax2.plot(time2, y2, color = 'b', label='speech waveform')
ax1.set_ylabel("Amplitude") # y 축
ax1.set_xlabel("Time [s]") # x 축
plt.title('cut '+file_id)
plt.savefig('cut_half '+file_id+'.png')
plt.show()
librosa.output.write_wav('cut_file.mp3', y2, sr) # save half-cut file

728x90

저작자표시 (새창열림)

'딥러닝 > Speech dataset Processing' 카테고리의 다른 글

flac dataset wav로 변환 및 downsampling (2)	2019.12.23
Matlab의 Stft결과와 Python librosa의 Stft결과의 다름 (1)	2019.08.03
Python Mel-Spectrogram(Mel scaled Spectrogram) 얻기 (9)	2019.05.09
Python 음성 신호 Down sampling, Resampling (0)	2019.05.09
음성 데이터 resampling, 저장시 읽히지 않고 오류가 생길 때 (0)	2019.03.19

Kaen's Ritus

음성 데이터 잘라보기

'딥러닝 > Speech dataset Processing' 카테고리의 다른 글

+ Recent posts

티스토리툴바