Referências Bibliográficas
ALMUTAIRI, Zaynab; ELGIBREEN, Hebah. A review of modern audio deepfake detection methods: Challenges and future directions. Algorithms, v. 15, n. 5, p. 155, 2022. Disponível em: https://www.mdpi.com/1999-4893/15/5/155. Acesso em: 9 mar. 2024.
ARDILA, Rosana et al. Common voice: A massively-multilingual speech corpus. arXiv preprint arXiv:1912.06670, 2019. Disponível em: https://arxiv.org/pdf/1912.06670. Acesso em: 10 mar. 2024.
AUDIO Corpora - Bases de áudio. Gitlab. Disponível em: https://gitlab.com/fb-audio-corpora. Acesso em: 7 mar. 2024.
BRITT, Kaeli. How are deepfakes dangerous? Nevada Today, 2023. Disponível em: https://www.unr.edu/nevada-today/news/2023/atp-deepfakes. Acesso em: 15 jun. 2024.
CASANOVA, Edresson et al. XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model. arXiv e-prints, p. arXiv: 2406.04904, 2024. Disponível em: https://arxiv.org/pdf/2406.04904. Acesso em: 11 jun. 2024.
DOSOVITSKIY, Alexey et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. Disponível em: https://arxiv.org/pdf/2010.11929. Acesso em: 20 jul. 2024.
DISTRIBUIÇÃO normal. In: WIKIPÉDIA: a enciclopédia livre. Disponível em: https://pt.wikipedia.org/wiki/Distribuição_normal. Acesso em: 30 abr. 2024.
FUKUSHIMA, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, v. 36, n. 4, p. 193–202, abr. 1980. Disponível em: https://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf. Acesso em: 25 abr. 2024.
GÖLGE, Eren et al. Coqui TTS (Version 1.4) [Computer software]. Zenodo, 2022. Disponível em: https://doi.org/10.5281/zenodo.6334862. Acesso em: 15 mai. 2024.
GOODFELLOW, Ian; BENGIO, Yoshua; COURVILLE, Aaron. Deep Learning. MIT Press, 2016. Disponível em: https://www.deeplearningbook.org/. Acesso em: 15 abr. 2024.
HAQUE, Ayaan. Artificial Data for Image Classification. Towards Data Science, 2020. Disponível em: https://towardsdatascience.com/artificial-data-for-image-classification-5b2ede40640f. Acesso em: 25 mai. 2024.
HAQUE, Ayaan. EC-GAN: Low-Sample Classification using Semi-Supervised Algorithms and GANs. arXiv preprint arXiv:2012.15864, 2021. Disponível em: https://arxiv.org/pdf/2012.15864. Acesso em: 25 mai. 2024.
HONG, T. J. Uncovering the Real Voice: How to Detect and Verify Audio Deepfakes. Medium, 2023. Disponível em: https://medium.com/htx-s-s-coe/uncovering-the-real-voice-how-to-detect-and-verify-audio-deepfakes-42e480d3f431. Acesso em: 5 mar. 2024.
IOFFE, Sergey. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015. Disponível em: https://arxiv.org/pdf/1502.03167. Acesso em: 2 jun. 2024.
KAWA, Piotr; PLATA, Marcin; SYGA, Piotr. Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection. arXiv preprint arXiv:2206.13979, 2022. Disponível em: https://arxiv.org/pdf/2206.13979v2. Acesso em: 20 mar. 2024.
KAWA, Piotr; PLATA, Marcin; SYGA, Piotr. Defense Against Adversarial Attacks on Audio DeepFake Detection. arXiv preprint arXiv:2212.14597, 2022. Disponível em: https://arxiv.org/pdf/2212.14597v2. Acesso em: 21 mar. 2024.
KEVIN, Chris. Feature Maps. Medium, 2018. Disponível em: https://medium.com/@chriskevin_80184/feature-maps-ee8e11a71f9e. Acesso em: 30 abr. 2024.
KIM, J. et al. Glow-TTS: A generative flow for text-to-speech via monotonic alignment search. arXiv preprint arXiv:2005.11129, 2020. Disponível em: https://arxiv.org/pdf/2005.11129. Acesso em: 4 abr. 2024.
KHANJANI, Zahra; WATSON, Gabrielle; JANEJA, Vandana P. Audio deepfakes: A survey. Frontiers in Big Data, v. 5, p. 1001063, 2023. Disponível em: https://www.frontiersin.org/articles/10.3389/fdata.2022.1001063/full. Acesso em: 9 mar. 2024.
KONG, J.; KIM, J.; BAE, J. HiFi-GAN: Generative adversarial networks for efficient and high fidelity speech synthesis. arXiv preprint arXiv:2010.05646, 2020. Disponível em: https://arxiv.org/pdf/2010.05646. Acesso em: 4 mar. 2024.
LEAL, R. DOS S. Datasets de Áudio em Português. Medium, 2021. Disponível em: https://medium.com/@renatoleal/datasets-de-%C3%A1udio-em-portugu%C3%AAs-b25316ec316a. Acesso em: 6 mar. 2024.
LECUN, Yann. Generalization and network design strategies. Connectionism in perspective, v. 19, n. 143-155, p. 18, 1989. Disponível em: http://yann.lecun.com/exdb/publis/pdf/lecun-89.pdf. Acesso em: 25 abr. 2024.
LEFFER, L. AI Audio Deepfakes Are Quickly Outpacing Detection. Scientific American, 2024. Disponível em: https://www.scientificamerican.com/article/ai-audio-deepfakes-are-quickly-outpacing-detection/. Acesso em: 5 mar. 2024.
LI, Xiang et al. Understanding the disharmony between dropout and batch normalization by variance shift. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019. p. 2682-2690. Disponível em: https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Understanding_the_Disharmony_ Between_Dropout_and_Batch_Normalization_by_Variance_CVPR_2019_paper.pdf. Acesso em: 19 mar. 2024.
MARTINS, Américo. Eleições nos EUA: uso de deepfake e IA revela problema que pode se repetir no Brasil. CNN Brasil, 2024. Disponível em: https://www.cnnbrasil.com.br/internacional/eleicoes-nos-eua-uso-de-deepfake-e-ia-revela-problema-que-pode-se-repetir-no-brasil/. Acesso em: 10 jun. 2024.
MASOOD, Momina et al. Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward. Applied intelligence, v. 53, n. 4, p. 3974-4026, 2023. Disponível em: https://arxiv.org/pdf/2103.00484. Acesso em: 10 mar. 2024.
MUBARAK, Rami et al. A Survey on the Detection and Impacts of Deepfakes in Visual, Audio, and Textual Formats. IEEE Access, 2023. Disponível em: https://ieeexplore.ieee.org/document/10365143?denied=. Acesso em: 9 mar. 2024.
MÜLLER, Nicolas M. et al. Does audio deepfake detection generalize?. arXiv preprint arXiv:2203.16263, 2022. Disponível em: https://arxiv.org/abs/2203.16263. Acesso em: 10 mar. 2024.
OLIVEIRA, Rafael Santana. Desenvolvimento de recursos e ferramentas para reconhecimento de voz em português brasileiro para Desktop e Sistemas Embarcados. Universidade Federal do Pará. Belém, 2012. Disponível em: https://ppgcc.propesp.ufpa.br/Dissertações_2012/Rafael%20Santana%20Oliveira_Dissertação.pdf. Acesso em: 3 mar. 2024.
PANDEY, Swarnima. How to choose the size of the convolution filter or Kernel size for CNN?. Medium, 2020. Disponível em: https://medium.com/analytics-vidhya/how-to-choose-the-size-of-the-convolution-filter-or-kernel-size-for-cnn-86a55a1e2d15. Acesso em: 30 abr. 2024.
PICZAK, Karol J. ESC: Dataset for environmental sound classification. In: Proceedings of the 23rd ACM international conference on Multimedia. 2015. p. 1015-1018. Disponível em: https://www.karolpiczak.com/papers/Piczak2015-ESC-Dataset.pdf. Acesso em: 23 abr. 2024.
PRADO, Magaly Pereira do. Deepfake de áudio: manipulação simula voz real para retratar alguém dizendo algo que não disse. TECCOGS – Revista Digital de Tecnologias Cognitivas, 2021, p. 45-68. Disponível em: https://revistas.pucsp.br/index.php/teccogs/article/view/55977/37926. Acesso em: 2 mar. 2024.
RALLABANDI, Srikari. Activation functions: ReLU vs. Leaky ReLU. Medium, 2023. Disponível em: https://medium.com/@sreeku.ralla/activation-functions-relu-vs-leaky-relu-b8272dc0b1be. Acesso em: 25 abr. 2024.
SALAMON, Justin; JACOBY, Christopher; BELLO, Juan Pablo. A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM international conference on Multimedia. 2014. p. 1041-1044. Disponível em: https://www.justinsalamon.com/uploads/4/3/9/4/4394963/salamon_urbansound_acmmm14.pdf. Acesso em: 10 set. 2024.
SCHILLING, Fabian. The Effect of Batch Normalization on Deep Convolutional Neural Networks. KTH Royal Institute of Technology, 2016. Disponível em: https://www.diva-portal.org/smash/get/diva2:955562/FULLTEXT01.pdf. Acesso em: 18 mai. 2024.
SCHÖRKHUBER, Christian; KLAPURI, Anssi. Constant-Q transform toolbox for music processing. In: 7th sound and music computing conference, Barcelona, Spain. SMC, 2010. p. 3-64. Disponível em: https://www.researchgate.net/profile/Christian-Schoerkhuber/publication/228523955_Constant-Q_transform_toolbox_for_music_processing/links/55126aa60cf20bfdad513a3f/Constant-Q-transform-toolbox-for-music-processing.pdf. Acesso em: 14 mai. 2024.
SHEN, J. et al. Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions. arXiv preprint arXiv:1712.05884, 2017. Disponível em: https://arxiv.org/pdf/1712.05884. Acesso em: 5 mar. 2024.
SOUZA, Thiago Campolina de. Como inteligência artificial, deepfakes e agências de checagem atuam na arena da desinformação. Jornal da USP, 04 nov. 2022. Disponível em: https://jornal.usp.br/ciencias/como-inteligencia-artificial-deepfakes-e-agencias-de-checagem-atuam-na-arena-da-desinformacao/. Acesso em: 2 mar. 2024.
SRIVASTAVA, Nitish. Improving Neural Networks with Dropout. University of Toronto, 2013. Disponível em: https://www.cs.toronto.edu/~nitish/msc_thesis.pdf. Acesso em: 21 mai. 2024.
STUPP, Catherine. Fraudsters used AI to mimic CEO’s voice in unusual cybercrime case. The Wall Street Journal, v. 30, n. 08, 2019. Disponível em: https://fully-human.org/wp-content/uploads/2019/10/Stupp_Fraudsters-Used-AI-to-Mimic-CEOs-Voice-in-Unusual-Cybercrime-Case.pdf. Acesso em: 9 mar. 2024.
THAMBAWITA, Vajira et al. Impact of Image Resolution on Deep Learning Performance in Endoscopy Image Classification: An Experimental Study Using a Large Dataset of Endoscopic Images. Diagnostics, 2021. Disponível em: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8700246/. Acesso em: 20 abr. 2024.
VAN DEN OORD, A. et al. WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016. Disponível em: https://arxiv.org/pdf/1609.03499. Acesso em: 4 mar. 2024.
VEAUX, Christophe et al. CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit. University of Edinburgh. The Centre for Speech Technology Research (CSTR), v. 6, p. 15, 2017. Disponível em: https://datashare.ed.ac.uk/handle/10283/2651. Acesso em: 16 fev. 2024.
WANG, Pin; FAN, En; WANG, Peng. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognition Letters, v. 141, p. 61-67, 2021. Disponível em: https://www.sciencedirect.com/science/article/pii/S0167865520302981. Acesso em: 17 mar. 2024.
YAMAGISHI, Junichi et al. Asvspoof 2019: The 3rd automatic speaker verification spoofing and countermeasures challenge database, 2019. Disponível em: https://erepo.uef.fi/handle/123456789/7718. Acesso em: 16 fev. 2024.