双语对照 | A Deep Cascade of Convolutional Neural Networks for Dynamic MR Image Reconstruction

Catalog

Abstract

摘要

Inspired by recent advances in deep learning, we propose a framework for reconstructing dynamic sequences of 2-D cardiac magnetic resonance (MR) images from undersampled data using a deep cascade of convolutional neural networks (CNNs) to accelerate the data acquisition process. In particular, we address the case where data are acquired using aggressive Cartesian undersampling. First, we show that when each 2-D image frame is reconstructed independently, the proposed method outperforms state-of-the-art 2-D compressed sensing approaches, such as dictionary learning-based MR image reconstruction, in terms of reconstruction error and reconstruction speed. Second, when reconstructing the frames of the sequences jointly, we demonstrate that CNNs can learn spatio-temporal correlations efficiently by combining convolution and data sharing approaches. We show that the proposed method consistently outperforms state-of-the-art methods and is capable of preserving anatomical structure more faithfully up to 11-fold undersampling. Moreover, reconstruction is very fast: each complete dynamic sequence can be reconstructed in less than 10 s and, for the 2-D case, each image frame can be reconstructed in 23 ms, enabling real-time applications.

受近期深度学习领域进展的启发，我们提出了一种利用深度级联的神经网络来实现从降采样数据中重建动态二维笛卡尔磁共振序列图像的框架，用以加速数据采集的过程。尤其是我们实现了重建经高度笛卡尔降采样数据的案例。首先，我们展示了在每一个二维图像都被独立重建的情况下，本文所提出的方法都在重建误差和重建速度方面优于最新的二维压缩感知方法（如基于字典学习的图像重建）。其次，当序列中的帧一起进行重建时，我们证明了卷积神经网络能够以结合卷积和数据共享方法的方式来有效地学习时空相关性。我们呢表明，本文所提出的方法始终优于最近的技术，并且可以在高达11倍降采样的情况下准确地保留解剖结构。此外，重建还非常地快速：每个完整的动态序列能在10s内完成重建，而且在二维的情况下，每一个图像帧能够在23ms内完成重建，从而使实时的应用成为可能。

Introduction

背景介绍

In many clinical scenarios, medical imaging is an indispensable diagnostic and research tool. One such important modality is Magnetic Resonance Imaging (MRI), which is non-invasive and offers excellent resolution with various contrast mechanisms to reveal different properties of the underlying anatomy. However, MRI is associated with an inherently slow acquisition process. This is because data samples of an MR image are acquired sequentially in $k$ -space and the speed at which $k$ -space can be traversed is limited by physiological and hardware constraints 1. M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressed sensing MRI,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 72–82, Mar. 2008. . A long data acquisition procedure imposes significant demands on patients, making this imaging modality expensive and less accessible. One possible approach to accelerate the acquisition process is to undersample $k$ -space, which in theory provides an acceleration rate proportional to a reduction factor of a number of $k$ -space traversals required. However, undersampling in $k$ -space violates the Nyquist-Shannon theorem and generates aliasing artefacts when the image is reconstructed. The main challenge in this case is to find an algorithm that can recover an uncorrupted image taking into account the undersampling regime combined with a-priori knowledge of appropriate properties of the image to be reconstructed.

在很多临床场景中，医学影像是一种不可或缺的诊断和研究工具。其中一个重要的模态是MRI（磁共振影像），它是非侵入性的，并且能在多种对比机制下提供出色的分辨率，从而可以揭示基础解剖结构的不同性质。然而，MRI总是与固有的缓慢采集过程联系在一起。这是因为磁共振图像数据的采样是在 $k$ -空间中依次进行的，而遍历 $k$ -空间的速度受到物理和硬件层面的限制1. M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressed sensing MRI,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 72–82, Mar. 2008. 。这样长时间的采集过程对患者提出了很高的要求，从而使这种成像模态变得昂贵且难以利用。一种可能的加速采集过程的方法是降采样 $k$ -空间，它在理论上能获得与减少 $k$ -空间中遍历的系数成比例的加速倍数。然而，在 $k$ -空间中的降采样违反了奈奎斯特-香农定律，从而会在重建时产生混叠伪影。这时，主要的挑战就是去寻找一种算法，考虑降采样的机制，结合欲重建图像特性的先验知识，用以还原完好的图像。

Using Compressed Sensing (CS), images can be reconstructed from sub-Nyquist sampling, assuming the following: firstly, the images must be compressible, i.e. they have a sparse representation in some transform domain. Secondly, one must ensure incoherence between the sampling and sparsity domains to guarantee that the reconstruction problem has a unique solution and that this solution is attainable. In practice, this can be achieved with random subsampling of $k$ -space, which produces aliasing patterns in the image domain that can be regarded as correlated noise. Under such assumptions, images can then be reconstructed through nonlinear optimisation or iterative algorithms. The class of methods which apply CS to the MR reconstruction problem is termed CS-MRI 1. M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressed sensing MRI,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 72–82, Mar. 2008. . In general, these methods use a fixed sparsifying transforms, e.g. wavelet transformations. A natural extension of these approaches has been to enable more flexible representations with adaptive sparse modelling, where one attempts to learn the optimal sparse representation from the data directly. This can be done by exploiting, for example, dictionary learning (DL) 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. .

运用压缩感知（CS）技术，图像之所以能从低于奈奎斯特要求的采样中重建，是基于以下假设：首先，图像必须是可压缩的（例如在某些转换域中具有稀疏表示）。其次，采样域和稀疏域之间需要确保是不相干的，从而保证重建问题具有独特的解并且解是可以求得的。在实践中，这可以用 $k$ -空间中随机的降采样来实现，这种采样方式在图像域中产生的混叠可以视为相关的噪声。在这样的假设下，图像可以通过非线性优化或迭代算法进行重建。这一类将压缩感知应用到磁共振重建问题的方法被称为CS-MRI1. M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressed sensing MRI,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 72–82, Mar. 2008. 。通常来说，这类方法使用一个固定的稀疏变换（如小波变换）。这些方法的一个自然延伸是通过自适应的稀疏模型来实现更加灵活的稀疏表示，其中一种思路就是直接从数据中学习最佳的稀疏表示。这可以通过利用如字典学习（DL）2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. 的方法来实现。

To achieve more aggressive undersampling, several strategies can be considered. One way is to further exploit the inherent redundancy of the MR data. For example, in dynamic imaging, one can make use of spatio-temporal redundancies 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. –4. H. Jung, J. C. Ye, and E. Y. Kim, “Improved $k-t$ BLAST and $k-t$ SENSE using FOCUSS,” Phys. Med. Biol., vol. 52, no. 11, pp. 3201–3226, Jun. 2007. 5. T. M. Quan and W.-K. Jeong, “Compressed sensing reconstruction of dynamic contrast enhanced MRI using GPU-accelerated convolutional sparse coding,” in Proc. IEEE 13th Int. Symp. Biomed. Imag., Prague, Czech Republic, Apr. 2016, pp. 518–521. , or when imaging a full 3D volume, one can exploit redundancy from adjacent slices 6. A. Hirabayashi, N. Inamuro, K. Mimura, T. Kurihara, and T. Homma, “Compressed sensing MRI using sparsity induced from adjacent slice similarity,” in Proc. Int. Conf. Sampling Theory Appl. (SampTA), Washington, DC, USA, May 2015, pp. 287–291. . An alternative approach is to exploit sources of explicit redundancy of the data to turn the initially underdetermined problem arising from undersampling into a determined or overdetermined problem that is easily solved. This is the fundamental assumption underlying parallel imaging 7. M. Uecker, “ESPIRiT—An eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA,” Magn. Reson. Med., vol. 71, no. 3, pp. 990–1001, Mar. 2014. . Similarly, one can make use of multi-contrast information 8. J. Huang, C. Chen, and L. Axel, “Fast multi-contrast MRI reconstruction,” Magn. Reson. Imag., vol. 32, no. 10, pp. 1344–1352, Dec. 2014. or the redundancy generated by multiple filter responses of the image 9. X. Peng and D. Liang, “MR image reconstruction with convolutional characteristic constraint (CoCCo),” IEEE Signal Process. Lett., vol. 22, no. 8, pp. 1184–1188, Aug. 2015. . These explicit redundancies can also be used to complement the sparse modelling of inherent redundancies 10. K. H. Jin, D. Lee, and J. C. Ye, “A novel k-space annihilating filter method for unification between compressed sensing and parallel MRI,” in Proc. IEEE Int. Symp. Biomed. Imag., New York, NY, USA, Apr. 2015, pp. 327–330. , 11. D. Liang, B. Liu, J. Wang, and L. Ying, “Accelerating SENSE using compressed sensing,” Magn. Reson. Med., vol. 62, no. 6, pp. 1574–1584, Dec. 2009. .

为了实现更高幅度的降采样，可以考虑几种策略。一种方法是进一步探索磁共振数据内在的冗余性。例如，在动态成像中，人们可以利用空间-时间层面的冗余性 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. -4. H. Jung, J. C. Ye, and E. Y. Kim, “Improved $k-t$ BLAST and $k-t$ SENSE using FOCUSS,” Phys. Med. Biol., vol. 52, no. 11, pp. 3201–3226, Jun. 2007. 5. T. M. Quan and W.-K. Jeong, “Compressed sensing reconstruction of dynamic contrast enhanced MRI using GPU-accelerated convolutional sparse coding,” in Proc. IEEE 13th Int. Symp. Biomed. Imag., Prague, Czech Republic, Apr. 2016, pp. 518–521. 。又或者当成像一个完整的三维物体时，人们可以利用相邻层之间的冗余性6. A. Hirabayashi, N. Inamuro, K. Mimura, T. Kurihara, and T. Homma, “Compressed sensing MRI using sparsity induced from adjacent slice similarity,” in Proc. Int. Conf. Sampling Theory Appl. (SampTA), Washington, DC, USA, May 2015, pp. 287–291. 。另一种可供替代的方法是利用数据源头中的显性的冗余性，把原本降采样提出的欠定问题变成容易解决的正定乃至过定问题。这是并行成像7. M. Uecker, “ESPIRiT—An eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA,” Magn. Reson. Med., vol. 71, no. 3, pp. 990–1001, Mar. 2014. 的基本假设。类似地，人们也可以利用多对比度的信息8. J. Huang, C. Chen, and L. Axel, “Fast multi-contrast MRI reconstruction,” Magn. Reson. Imag., vol. 32, no. 10, pp. 1344–1352, Dec. 2014. 或者同一个图像的多个滤波器响应产生的冗余9. X. Peng and D. Liang, “MR image reconstruction with convolutional characteristic constraint (CoCCo),” IEEE Signal Process. Lett., vol. 22, no. 8, pp. 1184–1188, Aug. 2015. 。这些显性的冗余性也可以用来补充隐形冗余性的稀疏模型10. K. H. Jin, D. Lee, and J. C. Ye, “A novel k-space annihilating filter method for unification between compressed sensing and parallel MRI,” in Proc. IEEE Int. Symp. Biomed. Imag., New York, NY, USA, Apr. 2015, pp. 327–330. 11. D. Liang, B. Liu, J. Wang, and L. Ying, “Accelerating SENSE using compressed sensing,” Magn. Reson. Med., vol. 62, no. 6, pp. 1574–1584, Dec. 2009. 。

Recently, deep learning has been successful at tackling many computer vision problems. Deep neural network architectures, in particular convolutional neural networks (CNNs), are becoming the state-of-the-art technique for various imaging problems including image classification 12. K. He, X. Zhang, S. Ren, and J. Sun. (Dec. 2015). “Deep residual learning for image recognition.” [Online]. Available: https://arxiv.org/abs/1512.03385 , object localisation 13. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. NIPS, Montreal, QC, Canada, 2015, pp. 91–99. and image segmentation 14. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. MICCAI, Munich, Germany, 2015, pp. 234–241. . Deep architectures are capable of extracting features from data to build increasingly abstract representations, replacing the traditional approach of carefully hand-crafting features and algorithms. For example, it has already been demonstrated that CNNs outperform sparsity-based methods in super-resolution 15. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, Feb. 2016. in terms of both reconstruction quality and speed 16. W. Shi, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf. CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 1874–1883. . One of the contributions of our work is to explore the application of CNNs in undersampled MR reconstruction and investigate whether they can exploit data redundancy through learned representations. In fact, CNNs have already been applied to compressed sensing from random Gaussian measurements 17. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “ReconNet: Non-iterative reconstruction of images from compressively sensed measurements,” in Proc. IEEE Conf. CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 449–458. . Despite the popularity of CNNs, there has only been preliminary research on CNN-based MR image reconstruction 18. Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-Net for compressive sensing MRI,” in Proc. NIPS Barcelona, Spain, 2016, pp. 10–18. , 19. S. Wang, “Accelerating magnetic resonance imaging via deep learning,” in Proc. IEEE 13th Int. Symp. Biomed. Imag., Prague, Czech Republic, Apr. 2016, pp. 514–517. , hence the applicability of CNNs to this problem for various imaging protocols has yet to be fully explored.

近来，深度学习已经成功地处理了许多计算机视觉问题。深度神经网络的架构，尤其是卷积神经网络（CNN），成为了解决如图像分类12. K. He, X. Zhang, S. Ren, and J. Sun. (Dec. 2015). “Deep residual learning for image recognition.” [Online]. Available: https://arxiv.org/abs/1512.03385 、物体定位13. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. NIPS, Montreal, QC, Canada, 2015, pp. 91–99. 以及图像分割14. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. MICCAI, Munich, Germany, 2015, pp. 234–241. 等问题的最先进的技术。深层的架构能够从数据中提取特征以构建抽象层次不断提升的表示，取代了传统手工设定特征和算法的方法。举例来说，目前已证明CNN在超分辨率问题中已经从重建质量15. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, Feb. 2016. 和速度16. W. Shi, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf. CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 1874–1883. 方面超越了基于稀疏的方法。而我们工作的一个贡献就是去探索CNN在降采样磁共振重建中的应用，并且探索其能否通过学习得到的表示来发掘数据的冗余性。事实上，CNN已经被运用于随机高斯采样的压缩感知17. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “ReconNet: Non-iterative reconstruction of images from compressively sensed measurements,” in Proc. IEEE Conf. CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 449–458. 。尽管CNN已经颇为热门，现在对基于CNN的磁共振图像重建还仅有初步的研究18. Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-Net for compressive sensing MRI,” in Proc. NIPS Barcelona, Spain, 2016, pp. 10–18. 19. S. Wang, “Accelerating magnetic resonance imaging via deep learning,” in Proc. IEEE 13th Int. Symp. Biomed. Imag., Prague, Czech Republic, Apr. 2016, pp. 514–517. ，因此在各种扫描协议下CNN处理该问题的适用性还没有被充分地探索。

In this work we consider reconstructing dynamic sequences of 2D cardiac MR images with Cartesian undersampling, as well as reconstructing each frame independently, using CNNs. We view the reconstruction problem as a de-aliasing problem in the image domain. Reconstruction of undersampled MR images is challenging because the images typically have low signal-to-noise ratio, yet often high-quality reconstructions are needed for clinical applications. To resolve this issue, we propose a deep network architecture which forms a cascade of CNNs.Code available at https://github.com/js3611/Deep-MRI-Reconstruction Our cascade network closely resembles the iterative reconstruction of DL-based methods, however, our approach allows end-to-end optimization of the reconstruction algorithm. For 2D reconstruction, the proposed method is compared to Dictionary Learning MRI (DLMRI) 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. and for dynamic reconstruction, the method is compared to Dictionary Learning with Temporal Gradient (DLTG ) 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. , kt Sparse and Low-Rank (kt-SLR) 20. S. G. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic MRI exploiting sparsity and low-rank structure: $K-t$ SLR,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1042–1054, May 2011. and Low-Rank Plus Sparse Matrix Decomposition (L + S) 21. R. Otazo, E. Candes, and D. K. Sodickson, “Low-rank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components,” Magn, Reson. Med., vol. 73, no. 3, pp. 1125–1136, 2015. , which are the state-of-the-art compressed sensing and low-rank approaches. We show that the proposed method outperforms them in terms of reconstruction error and perceptual quality, especially for aggressive undersampling rates. Moreover, owing to GPU-accelerated libraries, images can be reconstructed efficiently using the approach. In particular, for 2D reconstruction, each image can be reconstructed in about 23ms, which is fast enough to enable real-time applications. For the dynamic case, sequences can be reconstructed within 10s, which is reasonably fast for off-line reconstruction methods.

在这项工作中，我们考虑重建笛卡尔降采样的心脏二维动态磁共振成像序列，以及每一帧都用CNN进行独立的重建。我们把重建问题看成是一个图像域的去混叠问题。重建降采样的磁共振图像之所以是一个挑战是因为图像通常有着较低的信噪比，而临床应用却需要高质量的重建。为了解决，我们提出了一个由级联的CNN所组成的深度网络架构。代码可以在 https://github.com/js3611/Deep-MRI-Reconstruction 找到我们的级联网络非常类似基于字典学习的迭代重建方法，但是我们的方法能对重建算法进行端到端的优化。对于二维重建，我们提出的方法与字典学习MRI（DLMRI）2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. 进行比较。对于动态重建，则与具有时间梯度的字典学习（DLTG）3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. 、kt稀疏和低秩 (kt-SLR) 以及低秩加稀疏矩阵分解 (L + S) 技术进行比较。这些技术是最先进的压缩感知和低秩方法。我们表明本文所提出的方法就重建误差和可感知质量而言超过了他们，尤其是在较高降采样率下。而且由于GPU加速库的存在，图像能够有效地以这种方式重建。尤其是对于二维重建，每张图能在约23ms的时间内得到重建，这样快的速度已经足以实现实时的应用。对动态的情况而言，图像序列能在10s内得到重建，这对离线重建方法而言也是足够快的。

Problem Formulation

问题形成

Let $\mathbf {x}\in \mathbb{C}^{N}$ represent a sequence of 2D complex-valued MR images stacked as a column vector, where $N= N_{x} N_{y} N_{t}$ . Our problem is to reconstruct $\mathbf {x}$ from $\mathbf {y}\in \mathbb{C}^{M}$ ( $M \ll N$ ), undersampled measurements in $k$ -space, such that:

用 $\mathbf {x}\in \mathbb{C}^{N}$ 表示叠放成列向量的二维复数值磁共振图像序列，其中 $N= N_{x} N_{y} N_{t}$ 。我们的问题是从 $k$ -空间的降采样测量值 $\mathbf {y}\in \mathbb{C}^{M}$ ( $M \ll N$ ) 中重建 $\mathbf {x}$ ，这样就有：

$\mathbf {y}= \mathbf {F}_{u} \mathbf {x}+ \mathbf {e} \tag{1}\label{1}$

Here $\mathbf {F}_{u} \in \mathbb{C} ^{M\times N}$ is an undersampled Fourier encoding matrix and $\mathbf {e} \in \mathbb{C} ^{M}$ is acquisition noise modelled as additive white Gaussian (AWG) noise. In the case of Cartesian acquisition, we have $\mathbf {F}_{u} = \mathbf {M} \mathbf {F}$ , where $\mathbf {F} \in \mathbb{C} ^{N\times N}$ applies two-dimensional Discrete Fourier Transform (DFT) to each frame in the sequence and $\mathbf {M} \in \mathbb{C} ^{M\times N}$ is an undersampling mask selecting lines in $k$ -space to be sampled for each frame. The corresponding subset of indices sampled in $k$ -space is indicated by $\Omega$ . For the fully-sampled case, $M=N$ , the sequence is reconstructed by applying the 2D inverse DFT (IDFT) to each frame. However, Eq. $\eqref{1}$ is underdetermined even in the absence of noise, and hence the inversion is ill-posed; in particular, applying IDFT, which in this case is also called zero-filled reconstruction, results in a sequence of aliased images $\mathbf {x}_{u} = \mathbf {F}^{H}_{u} \mathbf {y}$ due to sub-Nyquist sampling. Note that $\mathbf {F}^{H}_{u}$ is the Hermitian of the encoding matrix, which first maps $\mathbf {y}\in \mathbb{C}^{M}$ to the $k$ - $t$ coordinate and then applies the 2D IDFT frame-wise. Examples of the aliased images are shown in Fig. 1. Therefore, in order to reconstruct $\mathbf {x}$ , one must exploit a-priori knowledge of its properties, which can be done by formulating an unconstrained optimisation problem:

其中 $\mathbf {F}_{u} \in \mathbb{C} ^{M\times N}$ 是降采样傅里叶编码矩阵，而 $\mathbf {e} \in \mathbb{C} ^{M}$ 是可以建模为加性高斯白噪声（AWG）的采集噪声。在笛卡尔采样的情况下，我们有 $\mathbf {F}_{u} = \mathbf {M} \mathbf {F}$ ，其中 $\mathbf {F} \in \mathbb{C} ^{N\times N}$ 对序列中的每一帧进行离散傅里叶变换（DFT），而 $\mathbf {M} \in \mathbb{C} ^{M\times N}$ 是在每一帧中选择 $k$ -空间采样线的降采样掩模。相应地，在 $k$ -空间中被采集的子集索引则用 $\Omega$ 表示。在全采样的情况下， $M=N$ ，整个序列可以通过对每一帧进行二维离散逆傅里叶变换（IDFT）得到重建。然而，公式 $\eqref{1}$ 即便在没有噪声的情况下也是欠定的，所以反变换是不适定的；特别地，如果直接进行逆傅里叶变换，这种情况被称为填零重建, 其结果是得到一系列由于低于奈奎斯特条件而形成的混叠图像 $\mathbf {x}_{u} = \mathbf {F}^{H}_{u} \mathbf {y}$ 。注意 $\mathbf {F}^{H}_{u}$ 是编码矩阵的伴随，它首先在 $k$ - $t$ 轴上映射 $\mathbf {y}\in \mathbb{C}^{M}$ 然后逐帧进行二维IDFT。混叠的图像如 Fig. 1. 所示。因此，为了重建 $\mathbf {x}$ ，人们必须利用其性质的先验知识，而这可以通过构造一个非约束优化问题来实现：

$\underset { \mathbf {x}}{\text {min.}}~ \mathcal {R}(\mathbf {x}) + \lambda \| \mathbf {y}- \mathbf {F}_{u} \mathbf {x}\|^{2}_{2} \tag{2}\label{2}$

Fig. 1.

An example of the image acquisition with Cartesian undersampling for a sequence of cardiac cine images. (a) A ground truth sequence that is fully-sampled in ${k}$ -space, shown along ${x}$ - ${y}$ and ${y}$ - ${t}$ for the image frame and the temporal profile respectively. (b) A Cartesian undersampling mask that only acquires 1/12 of samples in ${k}$ -space, where white indicates the sampled lines. Each image frame is undersampled with the mask shown along ${k}_{x}$ - ${k}_{y}$ . The undersampling pattern along the temporal dimension is shown in ${k}_{y}$ - ${t}$ . (c) The zero-filled reconstruction of the image acquired using the 12-fold undersampling mask. (d, e) 4-fold Cartesian undersampling mask and the resulting zero-filled image. Note that the aliasing artefact becomes more prominent as the undersampling factor is increased.

一个笛卡尔降采样的心脏动态成像采集案例。 (a) 在 ${k}$ -空间中全采样的基准真相，分别沿着 ${x}$ - ${y}$ 和 ${y}$ - ${t}$ 展示图像帧和时间维的轮廓。 (b) 一个在 ${k}$ -空间中只采集1/12采样点笛卡尔降采样掩模，其中白色是采样到的采样线。每一帧图像都用展示的掩模沿着 ${k}_{x}$ - ${k}_{y}$ 进行降采样。沿着时间维的采样模式以 ${k}_{y}$ - ${t}$ 的形式展现。 (c) 用12倍降采样掩模采集后填零重建的图像。 (d, e) 四倍笛卡尔降采样掩模和其产生的填零重建图像。注意到随着降采样倍数的增加，混叠伪影也变得更明显。

$\mathcal {R}$ expresses regularisation terms on $\mathbf {x}$ and $\lambda \in \mathbb{R}$ allows the adjustment of data fidelity based on the noise level of the acquired measurements $\mathbf {y}$ . For CS-based methods, the regularisation terms $\mathcal {R}$ typically involve $\ell _{0}$ or $\ell _{1}$ norms in the sparsifying domain of $\mathbf {x}$ . Our formulation is inspired by DL-based reconstruction approaches 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. , in which the problem is formulated as:

$\mathcal {R}$ 表示 $\mathbf {x}$ 上的正则化项，而 $\lambda \in \mathbb{R}$ 则可以基于取得的测量结果 $\mathbf {y}$ 来调节图像的保真度。对于基于压缩感知的方法，正则化项 $\mathcal {R}$ 通常是在 $\mathbf {x}$ 的稀疏域中的 $\ell _{0}$ 或 $\ell _{1}$ 范数。我们的公式受到基于字典学习方法2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. 的启发，其中原问题可以用以下公式表示：

$\underset { \mathbf {x}, \mathbf {D}, \{ \boldsymbol {\gamma }_{i}\} }{\text {min.}}~ \sum _{i} \left ({\| \mathbf {R}_{i} \mathbf {x}\! -\! \mathbf {D} \boldsymbol {\gamma }_{i} \|_{2}^{2} \!+\! \nu \| \boldsymbol {\gamma }_{i} \|_{0} }\right) + \lambda \| \mathbf {y}\!- \! \mathbf {F}_{u} \mathbf {x}\|^{2}_{2}\qquad \tag{3}\label{3}$

Here $\mathbf {R}_{i}$ is an operator which extracts a spatio-temporal image patch at $i$ , $\boldsymbol {\gamma }_{i}$ is the corresponding sparse code with respect to a dictionary $\mathbf {D}$ . In this approach, the regularisation terms force $\mathbf {x}$ to be approximated by the reconstructions from the sparse code of patches. By taking the same approach, for our CNN formulation, we force $\mathbf {x}$ to be well-approximated by the CNN reconstruction:

在这里 $\mathbf {R}_{i}$ 是在 $i$ 位置提取时-空图像小块的算子， $\boldsymbol {\gamma }_{i}$ 是对应的基于字典 $\mathbf {D}$ 形成的稀疏编码。在这种方法中，正则化项能让 $\mathbf {x}$ 近似为一系列小块的稀疏表示的重建。采用同样的方法，在我们的CNN公式中, 同样也让 $\mathbf {x}$ 成为CNN重建的良好近似：

$\underset { \mathbf {x}}{\text {min.}}~ \| \mathbf {x}- f_{\text {cnn}}(\mathbf {x}_{u} | \boldsymbol {\theta }) \|^{2}_{2} + \lambda \| \mathbf {F}_{u} \mathbf {x}- \mathbf {y}\|_{2}^{2} \tag{4}\label{4}$

Here $f_{\text {cnn}}$ is the forward mapping of the CNN parameterised by $\boldsymbol {\theta }$ , possibly containing millions of adjustable network weights, which takes in the zero-filled reconstruction $\mathbf {x}_{u}$ and directly produces a reconstruction as an output. Since $\mathbf {x}_{u}$ is heavily affected by aliasing from sub-Nyquist sampling, the CNN reconstruction can therefore be seen as solving a de-aliasing problem in the image domain. The approach of Eq. $\eqref{4}$ , however, is limited in the sense that the CNN reconstruction and the data fidelity are two independent terms. In particular, since the CNN operates in the image domain, it is trained to reconstruct the sequence without a-priori information of the acquired data in $k$ -space. However, if we already know some of the $k$ -space values, then the CNN should be discouraged from modifying them, up to the level of acquisition noise. Therefore, by incorporating the data fidelity in the learning stage, the CNN should be able to achieve better reconstruction. This means that the output of the CNN is now conditioned on $\Omega$ and $\lambda$ . Then, our final reconstruction is given simply by the output, $\mathbf {x}_{\text {cnn}} = f_{\text {cnn}}(\mathbf {x}_{u} | \boldsymbol {\theta }, \lambda, \Omega)$ . Given training data $\mathcal {D}$ of input-target pairs $(\mathbf {x}_{u}, \mathbf {x}_{\text {gnd}})$ where $\mathbf {x}_{\text {gnd}}$ is a fully-sampled ground-truth data, we can train the CNN to produce an output that attempts to accurately reconstruct the data by minimising an objective function:

这里 $f_{\text {cnn}}$ 是由包含了可达数百万个可调节网络权重的 $\boldsymbol {\theta }$ 参数化的CNN前向映射。它把填零的重建 $\mathbf {x}_{u}$ 作为输入并且直接输出一个重建结果。既然 $\mathbf {x}_{u}$ 是严重地受采样率不足混叠影响的，CNN重建就可以视为是一个图像域的去混叠问题。然而，公式 $\eqref{4}$ 受限于CNN重建和数据保真是两个独立的项。特别是，既然CNN是图像域中的操作，它是在没有采集数据的 $k$ -空间先验知识的情况下被训练进行重建的。然而，如果我们已经知道了部分 $k$ -空间的值，那CNN应该不被鼓励去在超过采集噪声的幅度上修改他们。因此，通过在学习阶段合并数据保真,CNN应该可以实现更好的重建。这意味着现在CNN的输出应该取决于 $\Omega$ 和 $\lambda$ 。进而我们最终的重建可以简单地由输出 $\mathbf {x}_{\text {cnn}} = f_{\text {cnn}}(\mathbf {x}_{u} | \boldsymbol {\theta }, \lambda, \Omega)$ 给出。给定由输入-目标对 $(\mathbf {x}_{u}, \mathbf {x}_{\text {gnd}})$ （其中 $\mathbf {x}_{\text {gnd}}$ 是全采样的基准真相）组成的训练数据 $\mathcal {D}$ ，我们能通过最小化下列目标函数来训练CNN产生尽可能准确的重建图像：

$\mathcal {L}(\boldsymbol {\theta }) = \!\sum _{(\mathbf {x}_{u}, \mathbf {x}_{\text {gnd}}) \in \mathcal {D}} \ell \left ({\mathbf {x}_{\text {gnd}}, \mathbf {x}_{\text {cnn}} }\right) \tag{5}\label{5}$

where $\ell$ is a loss function. In this work, we consider an element-wise squared loss, which is given by $\ell \left ({\mathbf {x}_{\text {gnd}}, \mathbf {x}_{\text {cnn}} }\right) = \| \mathbf {x}_{\text {gnd}} - \mathbf {x}_{\text {cnn}} \|_{2}^{2}$ .

其中 $\ell$ 是损失函数。在本工作中我们采用逐元素的方差，可以表示为 $\ell \left ({\mathbf {x}_{\text {gnd}}, \mathbf {x}_{\text {cnn}} }\right) = \| \mathbf {x}_{\text {gnd}} - \mathbf {x}_{\text {cnn}} \|_{2}^{2}$ 。

Data Consistency Layer

数据一致化层

Denote the Fourier encoding of the image reconstructed by CNN as $\mathbf {s}_{\text {cnn}} = \mathbf {F} \mathbf {x}_{\text {cnn}} = \mathbf {F} f_{\text {cnn}}(\mathbf {x}_{u} | \boldsymbol {\theta })$ . $\mathbf {s}_{\text {cnn}}(j)$ represents an entry at index $j$ in $k$ -space. The undersampled data $\mathbf {y}\in \mathbb{C}^{M}$ can be mapped onto the vectorised representation of $k$ - $t$ coordinate ( $\mathbb{C}^{N}$ ) by $\mathbf {s}_{0} = \mathbf {F} \mathbf {F}^{H}_{u} \mathbf {y}$ , which fills the non-acquired indices in $k$ -space with zeros. In order to incorporate the data fidelity in the network architecture, we first note the following: for fixed network parameters $\boldsymbol {\theta }$ , Eq. $\eqref{4}$ has a closed-form solution $\mathbf {s}_{\text {rec}}$ in $k$ -space, given as in 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. element-wise:

用 $\mathbf {s}_{\text {cnn}} = \mathbf {F} \mathbf {x}_{\text {cnn}} = \mathbf {F} f_{\text {cnn}}(\mathbf {x}_{u} | \boldsymbol {\theta })$ 表示CNN重建图像的傅里叶编码。 $\mathbf {s}_{\text {cnn}}(j)$ 表示在 $k$ -空间内位于索引 $j$ 的条目。降采样数据 $\mathbf {y}\in \mathbb{C}^{M}$ 能通过 $\mathbf {s}_{0} = \mathbf {F} \mathbf {F}^{H}_{u} \mathbf {y}$ 的方式映射成为 $k$ - $t$ 坐标 ( $\mathbb{C}^{N}$ ) 下的在未采样点填零的向量表示。为了将数据保真纳入网络架构，我们首先注意到以下这点：对给定的网络参数 $\boldsymbol {\theta }$ ，方程 $\eqref{4}$ 有一个在 $k$ -空间的近似解 $\mathbf {s}_{\text {rec}}$ ，在 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. 中以逐元素的方式给出:

$\mathbf {s}_{\text {rec}}(j) = \begin{cases} \mathbf {s}_{\text {cnn}}(j) & \text {if } j \not \in \Omega \\ \dfrac { \mathbf {s}_{\text {cnn}}(j) + \lambda \mathbf {s}_{0}(j)}{1+\lambda } & \text {if } j \in \Omega \end{cases} \tag{6}\label{6}$

The final reconstruction in the image domain is then obtained by applying the inverse Fourier encoding $\mathbf {x}_{\text {rec}} = \mathbf {F}^{H} \mathbf {s}_{\text {rec}}$ . The solution yields a simple interpretation: if the $k$ -space coefficient $\mathbf {s}_{\text {rec}}(j)$ is initially unknown (i.e. $j \not \in \Omega$ ), then we use the predicted value from the CNN. For the entries that have already been sampled ( $j \in \Omega$ ), we take a linear combination between the CNN prediction and the original measurement, weighted by the level of noise present in $\mathbf {s}_{0}$ . In the limit $\lambda \to \infty$ we simply replace the $j$ -th predicted coefficient in $\Omega$ by the original coefficient. For this reason, this operation is called a data consistency step in $k$ -space (DC). In the case of where there is non-neglegible noise present in the acquisition, $\lambda = q/\sigma$ must be adjusted accordingly, where $q$ is a hyper-parameter and $\sigma ^{2}$ is the power of AWG noise in $k$ -space (i.e. $\Re ({ \mathbf {e}_{i}}),\Im ({ \mathbf {e}_{i}}) \sim N(0, \sigma / \sqrt {2})$ ). In 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. , it is empirically shown that $p \in [5 \times 10^{-5}, 5\times 10^{-6}]$ for $\sigma ^{2} \in [4\times 10^{-8}, 10^{-9}]$ works sufficiently well.

那么最终图像域的重建结果能通过再加一个逆傅里叶变换 $\mathbf {x}_{\text {rec}} = \mathbf {F}^{H} \mathbf {s}_{\text {rec}}$ 的方式得到。这种解法有一个简单的解释：如果 $k$ -空间系数 $\mathbf {s}_{\text {rec}}(j)$ 一开始是未知的（即 $j \not \in \Omega$ ）那么我们就采用来自CNN的预测值。对于已经采样的条目（ $j \in \Omega$ ），我们用以 $\mathbf {s}_{0}$ 中展现的噪声幅度为权重线性结合原始采集和CNN的预测。当取极限 $\lambda \to \infty$ 时，其实就是简单地把在 $\Omega$ 中第 $j$ 位的预测系数替换成原始的系数。因此，这一操作被称为 $k$ -空间中的数据一致性（DC）手段。在采样过程噪声不可忽略的情况下， $\lambda = q/\sigma$ 必须根据情况做相应的调整，其中 $q$ 是一个超参数，而 $\sigma ^{2}$ 是 $k$ -空间中 AWG 噪声的功率（即 $\Re ({ \mathbf {e}_{i}}),\Im ({ \mathbf {e}_{i}}) \sim N(0, \sigma / \sqrt {2})$ ）。在 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. 中经验性地表明 $p \in [5 \times 10^{-5}, 5\times 10^{-6}]$ 对于 $\sigma ^{2} \in [4\times 10^{-8}, 10^{-9}]$ 的情况能有良好的结果。

Since the DC step has a simple expression, we can in fact treat it as a layer operation of the network, which we denote as a DC layer. When defining a layer of a network, the rules for forward and backward passes must be specified in order for the network to be end-to-end trainable. This is because CNN training can effectively be performed through stochastic gradient descent, where one updates the network parameters $\boldsymbol {\theta }$ to minimise the objective function $\mathcal {L}$ by descending along the direction given by the derivative $\partial \mathcal {L} / \partial \boldsymbol {\theta }^{T}$ . Therefore, it is necessary to define the gradients of each network layer relative to the network’s output. In practice, one uses an efficient algorithm called backpropagation 22. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, Oct. 1986. , where the final gradient is given by the product of all the Jacobians of the layers contributing to the output. Hence, in general, it suffices to specify a layer operation $f_{L}$ for the forward pass and derive the Jacobian of the layer with respect to the layer input $\partial f_{L} / \partial \boldsymbol { \mathbf {x}}^{T}$ for the backward pass.

既然DC步有一个简单的形式，那么我们实际上可以把它当作网络中的层操作，称作DC层。定义网络中的层时，必须指定向前和向后传递的规则，从而让网络能被端到端地训练。这是因为CNN的训练能有效地通过随机梯度下降来推进。这种方法通过沿着梯度 $\partial \mathcal {L} / \partial \boldsymbol {\theta }^{T}$ 的方向优化目标函数 $\mathcal {L}$ 来更新网络参数 $\boldsymbol {\theta }$ 。因此，有必要定义每一个与网络输出相关的层的梯度。在实践中，人们使用一种被称为反向传播 22. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, Oct. 1986. 的有效算法，它最终的梯度由所有对输出有贡献的层的雅可比行列式的乘积给出。因此，给前向传递指定一个层算子 $f_{L}$ ，同时给反向传播导出一个以层输入为自变量的层雅可比矩阵 $\partial f_{L} / \partial \boldsymbol { \mathbf {x}}^{T}$ 就足够了。

A. Forward Pass

A. 正向传递

The data consistency in $k$ -space can be simply decomposed into three operations: Fourier transform $\mathbf {F}$ , data consistency $f_{\text {dc}}$ and inverse Fourier transform $\mathbf {F}^{H}$ . The data consistency $f_{dc}$ performs the element-wise operation defined in Eq. $\eqref{6}$ , which, assuming $\mathbf {s}_{0}(j) = 0\,\,\forall j \not \in \Omega$ , can be written in matrix form as:

在 $k$ -空间中的数据一致性能被简单地分为三个算子：傅里叶变换 $\mathbf {F}$ ，数据一致化 $f_{\text {dc}}$ 以及逆傅里叶变换 $\mathbf {F}^{H}$ 。数据一致化 $f_{dc}$ 执行公式 $\eqref{6}$ 中定义的逐元素操作。如果假设 $\mathbf {s}_{0}(j) = 0\,\,\forall j \not \in \Omega$ ，那就能被写成像这样的矩阵形式：

$f_{dc}(\mathbf {s}, \mathbf {s}_{0}; \lambda) = \boldsymbol {\Lambda } \mathbf {s} + \frac {\lambda }{1 + \lambda } \mathbf {s}_{0} \tag{7}\label{7}$

Here $\boldsymbol {\Lambda }$ is a diagonal matrix of the form:

这里 $\boldsymbol {\Lambda }$ 是一个对角阵：

$\boldsymbol {\Lambda }_{kk} = \begin{cases} 1 & \text {if } j \not \in \Omega \\ \dfrac {1}{1+\lambda } & \text {if } j \in \Omega \end{cases} \tag{8}\label{8}$

Combining the three operations defined above, we can obtain the forward pass of the layer performing data consistency in $k$ -space:

结合以上三个算子，我们能得到 $k$ -空间一致化层的前向传递公式：

$f_{L}(\mathbf {x}, \mathbf {y}; \lambda) = \mathbf {F}^{H} \boldsymbol{\Lambda } \mathbf {F} \mathbf {x}+ \frac {\lambda }{1 + \lambda } \mathbf {F}_{u}^{H} \mathbf {y} \tag{9}\label{9}$

B. Backward Pass

B. 反向传递

In general, one requires Wirtinger calculus to derive a gradient in complex domain 23. A. Faijul and K. Murase, “Learning algorithms in complex-valued neural networks using Wirtinger calculus,” in Complex-Valued Neural Networks: Advances and Applications. Hoboken, NJ, USA: Wiley, 2013, pp. 550–559. . However, in our case, the derivation greatly simplifies due to the linearity of the DFT matrix and the data consistency operation. The Jacobian of the DC layer with respect to the layer input $\mathbf {x}$ is therefore given by:

通常来说，我们需要Wirtinger微积分来推导复数域中的梯度23. A. Faijul and K. Murase, “Learning algorithms in complex-valued neural networks using Wirtinger calculus,” in Complex-Valued Neural Networks: Advances and Applications. Hoboken, NJ, USA: Wiley, 2013, pp. 550–559. 。然而，对于我们的情况，由于DFT矩阵和数据一致性操作都是线性的，推导得以大大简化。从而以层输入 $\mathbf {x}$ 为自变量的DC层雅可比矩阵为：

$\frac {\partial {{f_{L}}}}{\partial { \mathbf {x}^{T}}} = \mathbf {F}^{H} \boldsymbol{\Lambda } \mathbf {F} \tag{10}\label{10}$

Note that unlike many other applications where CNNs process real-valued data, MR images are complex-valued and the network needs to account for this. One possibility would be to design the network to perform complex-valued operations. A simpler approach, however, is to accommodate the complex nature of the data with real-valued operations in a dimensional space twice as large (i.e. we replace $\mathbb{C}^{N}$ by $\mathbb{R}^{2N}$ ). In the latter case, the derivations above still hold due to the fundamental assumption in Wirtinger calculus.

值得注意的是，与许多其他CNN处理实数数据的应用不同，磁共振图像是复数值所以网络必须考虑到这一情况。一种可能是设计一个能进行复数操作的网络。而一种更简单的方式是用两倍大维度的实数空间来承接复数数据（即我们用 $\mathbb{R}^{2N}$ 来取代 $\mathbb{C}^{N}$ ）。在后一种情况下，由于Wirtinger微积分中的基本假设，上述推导依然成立。

The DC layer has one hyperparameter $\lambda \in \mathbb{R}$ . This value can be fixed or made trainable. In the latter case, the derivative $\frac {\partial { f_{\text {dc}} }}{\partial { \lambda }}$ (a column vector here) is given by:

在DC层中有一个超参数 $\lambda \in \mathbb{R}$ 。这个值可以是预先给定的，也可以是训练出的。对于后者，偏微分 $\frac {\partial { f_{\text {dc}} }}{\partial { \lambda }}$ （在这里是一个列向量）可以表示为：

$\left [{ \frac {\partial {f_{\text {dc}}(\mathbf {s}, \mathbf {s}_{0}; \lambda)}}{\partial {\lambda }} }\right]_{j} = \begin{cases} 0 & \text {if } j \not \in \Omega \\ \dfrac { \mathbf {s}_{0}(j)- \mathbf {s}_{\text {cnn}}(j)}{(1+\lambda)^{2}} & \text {if } j \in \Omega \end{cases} \tag{11}\label{11}$

and the update is $\Delta \lambda = \mathbf {J}_{e} \frac {\partial { f_{\text {dc}} }}{\partial { \lambda }}$ where $\mathbf {J}_{e}$ is the error backpropagated via the Jacobians of the layers proceeding $f_{\text {dc}}$ .

而更新的增量则为 $\Delta \lambda = \mathbf {J}_{e} \frac {\partial { f_{\text {dc}} }}{\partial { \lambda }}$ ，其中 $\mathbf {J}_{e}$ 是通过前向层 $f_{\text {dc}}$ 的雅可比矩阵反向传播的误差。

Cascading Network

级联网络

For CS-based methods, in particular for DL-based methods, the optimisation problem such as in Eq. $\eqref{3}$ is solved using a coordinate-descent type algorithm, alternating between the de-aliasing step and the data consistency step until convergence. In contrast, with CNNs, we are performing one step de-aliasing and the same network cannot be used to de-alias iteratively. While CNNs may be powerful enough to learn one step reconstruction, such a network could show signs of overfitting, unless there is vast amounts of training data. In addition, training such networks may require a long time as well as careful fine-tuning steps. It is therefore best to be able to use CNNs for iterative reconstruction approaches.

对基于压缩感知的方法，尤其是基于字典学习的方法，形如方程 $\eqref{3}$ 的优化问题是用坐标下降类的算法来求解的，并在去混叠和数据一致化步之间轮流进行直至收敛。相比之下，对于CNN，我们执行一步到位去混叠，而且相同的网络无法被迭代地用于去混叠。尽管CNN可能足以实现一步重建，但除非有大量的训练数据，否则这样的网络可能会表现出过拟合的迹象。而且，训练这样的网络可能需要较长的时间以及仔细的微调步骤。因此，最好能将CNN运用到迭代重建之中。

A simple solution is to train a second CNN which learns to reconstruct from the output of the first CNN. In fact, we can concatenate a new CNN on the output of the previous CNN to build extremely deep networks which iterate between intermediate de-aliasing and the data consistency reconstruction. We term this a cascading network. In fact, one can essentially view this as unfolding the optimisation process of DLMRI. If each CNN expresses the dictionary learning reconstruction step, then the cascading CNN can be seen as a direct extension of DLMRI, where the whole reconstruction pipeline can be optimised from training, as seen in Fig. 4. In particular, owing to the forward and back-backpropagation rules defined for the DC layer, all subnetworks can be trained jointly in an end-to-end manner, defining yielding one large network.

一个简单的解决方案是训练第二个CNN来学习重建第一个CNN的输出。事实上，我们可以在现有CNN输出后不断连接新的CNN从而构造出在去混叠和数据一致化重建之间来回迭代的非常深的网络。我们将其命名为级联网络。事实上，人们可以将其视为DLMRI优化过程的展开形式。如果每个CNN都表示字典学习学习中的一步，那么级联的CNN可以视为是DLMRI的直接延伸，而整个重建流程能通过训练而得到优化，如 Fig. 4. 所示。特别地，由于DC层定义了前向和反向传播的规则，所有的子网络能以端到端的方式被联合训练，从而定义产生了一个巨大的网络。

Fig. 4.

A cascade of CNNs. DC denotes the data consistency layer and DS denotes the data sharing layer. The number of convolution layers within each network and the depth of cascade is denoted by ${n}_{d}$ and ${n}_{c}$ respectively. Note also that DS layer only applies when the input is a sequence of images.

一个CNN的级联。DC表示数据一致化层，而DS表示数据共享层。每个网络中卷积层的数量级联的深度分别用 ${n}_{d}$ 和 ${n}_{c}$ 表示。需要指出的是，DS层只有在输入是图像序列时才会使用。

Data Sharing Layer

数据共享层

For the case of reconstructing dynamic sequences, the temporal correlation between frames can be exploited as an additional regulariser to further de-alias the undersampled images. For this, we use 3D convolution to learn spatio-temporal features of the input sequence. In addition, we propose incorporating features that could benefit the CNN reconstruction, inspired by data sharing approaches 24. S. J. Riederer, T. Tasciyan, F. Farzaneh, J. N. Lee, R. C. Wright, and R. J. Herfkens, “MR fluoroscopy: Technical feasibility,” Magn. Reson. Med., vol. 8, no. 1, pp. 1–15, Sep. 1988. –25. V. Rasche, R. W. de Boer, D. Holz, and R. Proksa, “Continuous radial data acquisition for dynamic MRI,” Magn. Reson. Med., vol. 34, no. 5, pp. 754–761, Nov. 1995. 26. S. Zhang, K. T. Block, and J. Frahm, “Magnetic resonance imaging in real time: Advances using radial FLASH,” Magn. Reson. Imag., vol. 31, no. 1, pp. 101–109, Jan. 2010. : if the change in image content is relatively small for any adjacent frames, then the neighbouring $k$ -space samples along the temporal-axis often capture similar information. In fact, as long as this assumption is valid, for each frame, we can fill the entries using the samples from the adjacent frames to approximate missing $k$ -space samples. Specifically, for each frame $t$ , all frames from $t-n_{adj}$ to $t+n_{adj}$ are considered, filling the missing $k$ -space samples at frame $t$ . If more than one frame within the range contains a sample at the same location, we take the weighted average of the samples. The idea is demonstrated in Fig. 2.

Fig. 2.

The illustration of data sharing approach. The acquired lines, which can be seen as ${n}_{\textit {adj}}=\textsf {0}$ , are colour-coded for each time frame. For each ${n}_{\textit {adj}}$ , the missing entries in each frame are aggregated using the values from up to $\pm {n}_{\textit {adj}}$ neighbouring frames. The overlapped lines are averaged.

数据共享方法的说明。每一帧中采集到的线（标注为 ${n}_{\textit {adj}}=\textsf {0}$ ）都分别进行着色。对于每一个 ${n}_{\textit {adj}}$ ，每一帧中缺失的条目用最多 $\pm {n}_{\textit {adj}}$ 个相邻帧的数值来融合。重叠的线平均处理。

An example of data sharing with $n_{adj}=2$ applied to the Cartesian undersampling is shown in Fig. 3(a). As data sharing aggregates the lines in $k$ -space, the resulting images can be seen as a zero-filled reconstruction from a measurement with lower undersampling factor. In practice, however, cardiac sequences contain highly dynamic content around the heart and hence combining the adjacent frames results in data inconsistency around the dynamic region, as illustrated in Fig. 3(b,c,d). However, for CNN reconstruction, we can incorporate these images as an extra input to train the network rather than treating them as the final reconstructions. Note that the reduction in the apparent acceleration factor is non-trivial to calculate: if each frame samples 10% of $k$ -space, combining 5 adjacent frames in theory should cover 50%. However, one often relies on variable density sampling, which samples low-frequency terms more often, yielding overlapped lines between the adjacent frames. Therefore, the apparent acceleration factor is often much less. As a remedy, regular sampling can be considered. However, regular sampling results in coherent artifact in the image domain, the removal of which is a different problem from the one we address here, which attempts to resolve incoherent aliasing patterns. Alternatively, one can perform a sampling trajectory optimisation to reduce the overlapping factor, however, this is out-of-scope for this work and will be investigated in future.

Fig. 3.

The illustration of data sharing approach applied to the image and the mask from Fig.1 (a,b). In this figure, (a) shows the appearance of the resulting sequence for ${n}_{\textit {adj}} = \textsf {2}$ . (b) The entries in ${k}$ -space that are either acquired or aggregated using the data sharing approach with ${n}_{\textit {adj}} = \textsf {2}$ , which conceptually defines a sampling mask. (c) For a comparison, we show the resulting zero-filled reconstruction if (b) were treated as a mask. (d) The error map between the (a) and (b). One can observe their similarity except for the data inconsistency of the dynamic content around the heart region. Note that for ${n}_{\textit {adj}} = \textsf {2}$ , the obtained image has the appearance similar to acceleration factor around 4 (rather than 12/5 = 2.4, which is the maximum achievable from 5 frames) due to overlapping lines.

数据共享法应用于 Fig.1 (a,b) 图像和掩模的说明。在本图中， (a) 展示了 ${n}_{\textit {adj}} = \textsf {2}$ 的结果序列。 (b)b和c标反了在数据共享法的 ${n}_{\textit {adj}} = \textsf {2}$ 处采集或融合的 ${k}$ -空间条目。 (c) 作为对比，我们展示了当 (b) 被当作是掩模时的填零重建结果。 (d) (a)和(b)之间的误差图。他们之间的相似性是容易发现的，除了在心脏区域附近的动态内容。注意对于 ${n}_{\textit {adj}} = \textsf {2}$ ，获得的图像由于有重叠的线条，其加速倍率在4左右，而不是5帧的最大采集量 12/5 = 2.4。

For our network, we implement data sharing (DS) layers which take an input image and generate multiple “data-shared” images for a range of $n_{adj}$ . The resulting images are concatenated along the channel-axis and treated as a new input fed into the first convolution layer of the CNNs. Therefore, using the images obtained from data sharing can be interpreted as transforming the problem into joint estimation of aliasing as well as the dynamic motion, where the effect of aliasing is considerably smaller. Note that for the cascading network architecture, from the second subnetwork onwards, the input to each subnetwork is no longer “undersampled”, but instead contains intermediate predicted values from the previous subnetwork. In this case, we average all the entries from the adjacent frames and update the samples which were not initially acquired. For this work, we allocate equal weight on all adjacent $k$ -space samples, however, in future, more elaborate averaging schemes can be considered.

Architecture and Implementation

架构与实施

Incorporating all the new elements mentioned above, we can devise our cascading network architecture. Our CNN takes in a two-channeled sequence of images $\mathbb{R}^{2N_{x} N_{y} N_{t}}$ , where the channels store real and imaginary parts of the zero-filled reconstruction in the image domain. Based on literature, we used the following network architecture for the CNN, illustrated in Fig. 4: it has $n_{d}-1\,\,3\text{D}$ convolution layers $C_{i}$ , which are all followed by Rectifier Linear Units (ReLU) as a choice of nonlinearity. For each of them, we used a kernel size $k = 3$ 27. C. Szegedy, “Going deeper with convolutions,” in Proc. IEEE CVPR, Jun. 2015, pp. 1–9. and the number of filters was set to $n_{f} = 64$ . The final layer of the CNN module is a convolution layer $C_{\text {rec}}$ with $k=3$ and $n_{f} = 2$ , which projects the extracted representation back to the image domain. We also used residual connection 12. K. He, X. Zhang, S. Ren, and J. Sun. (Dec. 2015). “Deep residual learning for image recognition.” [Online]. Available: https://arxiv.org/abs/1512.03385 , which sums the output of the CNN module with its input. Finally, we form a cascading network by using the DC layers interleaved with the CNN reconstruction modules $n_{c}$ times. For DS layer, we take the input to each subnetwork, generating images for all $n_{adj} \in \{0, 1, {\dots }, 5\}$ . As aforementioned, the resulting images are concatenated along the channel-axis and fed to the first convolution layer. We found that this choice of architecture works sufficiently well, however, the parameters were not optimised and there is therefore room for refinement of the results presented. Hence the result is likely to be improved by, for example, incorporating pooling layers and varying the parameters such as kernel size and stride 14. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. MICCAI, Munich, Germany, 2015, pp. 234–241. , 28. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Proc. ICLR, San Juan, Puerto Rico, 2016, pp. 1–9. .

Our model can also be used for 2D image reconstruction by setting $N_{t} = 1$ and use 2D convolution layers instead, however, data sharing does not apply to 2D reconstruction. For the following experiments, we first explore the network configurations by considering 2D MR image reconstruction. We identify our network by the values of $n_{c}$ , $n_{d}$ and the use of data sharing. For example, D5-C2 means a network with $n_{d}=5$ , $n_{c}=2$ with no data sharing. D5-C10(S) corresponds a network with $n_{d}=5$ , $n_{c}=10$ and data sharing.

As mentioned, pixel-wise squared error was used as the objective function. As the proposed architecture is memory-intensive, a small minibatch size is used to train the cascade networks. We used minibatch size 1 for all the experiments but we did not observe any problem with the convergence. We initialised the network weights using He initialisation 29. K. He, X. Zhang, S. Ren, and J. Sun. (Feb. 2015). “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification.” [Online]. Available: https://arxiv.org/abs/1502.01852 . The Adam optimiser 30. D. Kingma and J. Ba. (Dec. 2014). “Adam: A method for stochastic optimization.” [Online]. Available: https://arxiv.org/abs/1412.6980 was used to train all models, with parameters $\alpha = 10^{-4}, \beta _{1} = 0.9$ and $\beta _{2} = 0.999$ unless specified. We also added $\ell _{2}$ weight decay of $10^{-7}$ .

Experimental Results

实验结果

A. Setup

A. 设置

1) Dataset:

1) 数据集:

Our method was evaluated using the cardiac MR dataset consisting of 10 fully sampled short-axis cardiac cine MR scans. Each scan contains a single slice SSFP acquisition with 30 temporal frames with a $320 \times 320$ mm field of view and 10 mm slice thickness. The data consists of 32-channel data with sampling matrix size $192\times 190$ , which was zero-filled to the matrix size $256\times 256$ . The raw multi-coil data was reconstructed using SENSE 31. K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger, “SENSE: Sensitivity encoding for fast MRI,” Magn. Reson. Med., vol. 42, no. 5, pp. 952–962, Nov. 1999. with no undersampling and retrospective gating. Coil sensitivity maps were normalized to a body coil image to produce a single complex-valued image set that could be back-transformed to regenerate complex $k$ -space samples or further processed to form final magnitude images. For the following experiments, we perform retrospective undersampling, simulating a practical single-coil acquisition scenario.

我们的方法使用由10个全采样短轴心脏动态磁共振扫描所组成的心脏磁共振数据集来评估。

2) Undersampling:

2) 降采样:

In this work, we focus on Cartesian undersampling, where one fully samples frequency-encodes (along $k_{x}$ ) and randomly undersamples the phase encodes (along $k_{y}$ ). In addition, we pair consecutive phase encodes, which has been reported to reduce eddy current which is a source of image degredation 33. M. Markl, K. Scheffler, and O. Bieri, “Analysis and compensation of eddy currents in balanced SSFP,” Magn. Reson. Med., vol. 54, no. 1, pp. 129–137, Jul. 2005. . For each frame, the eight lowest spatial frequencies are always acquired and other frequencies have a probability of being acquired determined by a zero-mean Gaussian variable density function that is marginally offset, such that the probability of acquisition never reaches zero even at the highest frequencies. An implementation of this approach can be found in 4. H. Jung, J. C. Ye, and E. Y. Kim, “Improved $k-t$ BLAST and $k-t$ SENSE using FOCUSS,” Phys. Med. Biol., vol. 52, no. 11, pp. 3201–3226, Jun. 2007. , and an example of a 2D mask and its effect on the magnitude of a temporal frame is shown in Fig. 5. For each experiment, the undersampling rate is fixed and will be stated. For training, the sampling masks were generated on-the-fly to allow the network to learn the differences between potential aliasing artefacts and the underlying signal better. Note that for each acceleration factor $acc$ , one can generate ${\binom{k_{y}}{ {k_{y} / acc}}}$ different masks.

Fig. 5.

The detail of the Cartesian undersampling mask employed in this work. Note that the mask can be seen as a 3D volume indexed by ( ${k}_{x},{k}_{y},{t}$ ). For each image frame ${t}$ , we fully sample along ${k}_{x}$ -axis and undersample in ${k}_{y}$ direction. We always acquire the 8 central lines and the remaining lines are sampled according to a zero-mean Gaussian distribution with the tail that is marginally offset so it will never reach zero.

本工作所采用笛卡尔掩模的具体详情。注意掩模可以被视作是一个由( ${k}_{x},{k}_{y},{t}$ )索引的三维体。对于每一个图像帧 ${t}$ ，我们完整地沿着 ${k}_{x}$ 轴进行采样，并在 ${k}_{y}$ 方向进行降采样。我们总是采集中心的八条线，然后剩余的线则根据一个零均值高斯分布来采样。让这个分布在两端有着微小的偏移，从而使其不会太接近0.

While Cartesian acquisition is the most common protocol in practice and offers straightforward implementation using fast Fourier transform (FFT), other practical sampling strategies such as radial 34. K. T. Block, M. Uecker, and J. Frahm, “Undersampled radial MRI with multiple coils. Iterative image reconstruction using a total variation constraint,” Magn. Reson. Med., vol. 57, no. 6, pp. 1086–1098, Jun. 2007. or spiral 35. M. Lustig, J. H. Lee, D. L. Donoho, and J. M. Pauly, “Faster imaging with randomly perturbed, under-sampled spirals and L1 reconstruction,” in Proc. ISMRM, Miami Beach, FL, USA, 2005, p. 685. could be considered, which achieve greater aliasing incoherence. Nevertheless, they require the use of methods such as nonuniform Fourier transforms and gridding 36. J. A. Fessler, “On NUFFT-based gridding for non-Cartesian MRI,” J. Magn. Reson., vol. 188, no. 2, pp. 191–195, Oct. 2007. which could propagate interpolation errors.

3) Data Augmentation:

3) 数据增强:

Typically, deep learning benefits from large datasets, which are often not available for medical images. Our dataset is relatively small (300 images), however, the literature suggests that it is still possible to train a network by applying appropriate data augmentation strategies 14. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. MICCAI, Munich, Germany, 2015, pp. 234–241. . Therefore, we follow that practice and apply data augmentation including rigid transformation and elastic deformation to counter overfitting. Specifically, given each image (or a sequence of images), we randomly apply translation up to ±20 pixels along $x$ and $y$ -axes, rotation of $[0, 2\pi$ ), reflection along $x$ -axis by 50% of chance. Therefore, from rigid transformation alone, we create 0.3 million augmented data per image. Combined with the on-the-fly generation of undersampling masks, we generate very large dataset. For the dynamic scenario, we further added elastic deformation, using the implementation in 32. P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in Proc. ICDAR, vol. 3. Aug. 2003, pp. 958–962. , with parameters $\alpha \in [{0, 3}]$ and $\sigma \in [{0.05, 0.1}]$ , sampled uniformly, as well as reflection along temporal axis. Note that while strong elastic deformation may produce anatomically unrealistic shapes its use is justified as our goal is to train a network which learns to de-alias the underlying object in the image, rather than explicitly learning the anatomical shapes.

4) Evaluation Methodology:

4) 评价原则:

For the 2D experiments, we split the dataset into training and testing sets including 5 subjects each. Each image frame in the sequence is treated as an individual image, yielding a total of 150 images per set. Note that typically, a portion of training data is treated as a validation set utilised for early-stopping 37. C. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer-Verlag, 2006., where one halts training if the validation error starts to increase. Initially, we used 3–2–5 split for training, validation and testing. However, even after 3 days of training cascade networks, we did not observe any decrease in the validation error. Therefore, we instead included the validation set in the training to further improve the performance but fix the number of backpropagation to be an order of $10^{5}$ , which we empirically found to be sufficient. For the dynamic experiments, we used 7-3 split for training and testing and an order of $10^{4}$ for the number of backpropagation.

To evaluate the performances of the trained networks, we used mean squared error (MSE) as our quantitative measure. The reconstruction signal-to-noise ratio from undersampled data is highly dependent on the imaging data and the undersampling mask. To take this into consideration for fair comparison, we assigned an arbitrary but fixed undersampling mask for each image in the test data, yielding a fixed number of image-mask pairs to be evaluated.

B. Reconstruction of 2D Images

B. 2D图像的重建

1) Trade-Offs Between ${n}_{d}$ and ${n}_{c}$ :

1) ${n}_{d}$ 与 ${n}_{c}$ 之间的权衡 :

In this experiment we compared two architectures: D5-C2 ( $n_{d}=5,n_{c}=2$ ) and D11-C1 ( $n_{d} = 11, n_{c}= 1$ ) to evaluate the benefit of the DC step. The two networks have equivalent depths when the DC layers are viewed as feature extraction layers. However, the former can build deeper features of the image, whereas the latter benefits from the intermediate data consistency step. The undersampling rate was fixed to 3-fold and each network was trained end-to-end for $3 \times 10^{5}$ backpropagations.

The MSE’s on the training and test data are shown in Fig. 6. Note that a gap between the performance on training and test set may exist by the nature of the dataset (e.g. due to image features, initial level of aliasing, etc.) and therefore it is more informative to study in combination the rate of improvement and the slope at the tail of the curves to assess the overfitting process. Indeed, one can observe that D11-C1 eventually started to overfit the training data after about $1.2\times 10^{5}$ backpropagations. As one would expect, since our dataset is small, deep networks can overfit easily. On the other hand, both train and test errors for D5-C2 were notably lower and had relatively tighter gap, showing better generalisability compared to D11-C1. This is suggestively because the architecture employs two data consistency steps and rebuilds the representations at each cascading iteration. This suggests that it is more beneficial to interleave DC layers projecting the acquired $k$ -space onto intermediate reconstructions with the CNN image reconstruction modules, which appears to help both the reconstruction as well as the generalisation. Nevertheless, there is a considerable gap between train and test data even for D5-C2. However, we note from the figure that even after $3 \times 10^{5}$ backpropagations, the test error is still improving. Therefore, although it seems that the network gets more optimised to the features in training data quickly, it still learns features generalisable to test data. Having more training data is likely to accelerate the learning process.

Fig. 6.

A comparison of the networks with and without the intermediate DC step. D5-C2 shows superior performance over D11-C1. In particular, D5-C2 has considerably lower test error, showing an improved generalization property.

有和没有中间DC层的对比。D5-C2显示了比D11-C1更好的性能。尤其是D5-C2的测试误差有了大幅的降低，展现了更好的泛化性能。

2) Effect of Cascading Iterations ${n}_{c}$ :

2) ${n}_{c}$ 在级联迭代中的作用 :

In this experiment, we explored how much benefit the network can get by increasing the cascading iteration. We fixed the architectures to have $n_{d}=5$ , but varied the cascading iteration $n_{c} \in \{1, 2, 3, 4, 5\}$ . For this section, due to time constraints, we trained the networks using a greedy approach: we initialised the cascading net with $n_{c} = k$ using the net with $n_{c}=k-1$ that was already trained. For each $n_{c}$ , we performed 10⁵ backpropagations. Note that the greedy approach leads to a satisfactory solution, however, better results can be achieved with random initialisation, as initialising a network from another networks convergence point can make it more likely that it gets stuck in suboptimal local minima.

Reconstruction errors for each cascading network of different $n_{c}$ are shown in Fig. 7. We observed that while deeper cascading nets tend to overfit more, they still reduced the test error every time. The rate of improvement was reduced after 3 cascading layers, however, we see that the standard deviation of error was also reduced for the deeper models. In the interest of space, we have not shown the resulting images of each D5-C $n_{c}$ but we have observed the that increasing $n_{c}$ resulted in images with more of the subtle image details correctly reconstructed and there were also less noise-like aliasing remaining in the images.

Fig. 7.

The effect of increasing cascading iteration ${n}_{c}$ . One can see that the reconstruction error on both training and test data monotonically decreases as ${n}_{c}$ increases. However, the rate of improvement is reduced after ${n}_{c}=\textsf {3}$ .

增加级联迭代次数 ${n}_{c}$ 的效果。容易看出在训练集和测试集数据上的训练误差都随着 ${n}_{c}$ 的增加而单调递减。然而，改善的幅度在 ${n}_{c}=\textsf {3}$ 之后就明显减少了。

On the other hand, in Fig. 8, we show the intermediate reconstructions from each subnetwork within D5-C5 to better understand how the network exploits the iterative nature internally. In general, we see that the cascading net gradually recovers and sharpens the output image. Although the reconstruction error decreased monotonically at each cascading depth, we observed that the output of the fourth subnetwork appears to be more grainy than the output of the preceding subnetwork. This suggests the benefit of the end-to-end training scheme: since we are optimising the whole pipeline of reconstruction, the additional CNNs are internally used to rectify the error caused by the previous CNNs. In this case, the fourth subnetwork appears to counteract over-smoothing in the third subnetwork.

Fig. 8.

2D reconstruction results of $\textit {D5}$ - $\textit {C5}$ for one of the test subjects. Here we inspect the intermediate output from each subnetwork in D5-C5. (a) Ground truth (b) The input to the network that was 3 $\times$ undersampled image. The output of (c) first, (d) second, (e) third, (f) fourth cascading subnetwork respectively. (g,h) The final output and the corresponding error. Note that this is not the reconstruction results from the networks in Experiment in VII-B2.

其中一个测试对象的 $\textit {D5}$ - $\textit {C5}$ 二维重建结果。在这里我们逐个审视D5-C5中每一个子网络的输出。 (a) 基准真相 (b) 网络的输入，即三倍欠采样图像。第一层 (c) 、第二层 (d) 、第三层 (e) 、第四层 (f) 级联子网络各自的输出。 (g,h) 最终输出和对应的误差。注意这不是实验 VII-B2 中各个不同网络的重建结果。

3) Comparison With DLMRI:

3) 与 DLMRI 的比较 :

In this experiment, we compared our model with the state-of-the-art DL-based method, DLMRI, for reconstructing individual 2D cardiac MR images. The comparison was performed for 3-fold and 6-fold acceleration factors.

a) Models:

a) 模型:

For CNN, we selected the parameters $n_{d} = 5$ , $n_{c}=5$ . To ensure a fair comparison, we report the aggregated result on the test set from two-way cross-validation (i.e. two iterations of train on five subjects and test on the other five). For each iteration of the cross validation, the network was end-to-end trained using He intialisation 29. K. He, X. Zhang, S. Ren, and J. Sun. (Feb. 2015). “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification.” [Online]. Available: https://arxiv.org/abs/1502.01852 . For 6-fold undersampling, we initialised the network using the parameters obtained from the trained models from 3-fold acceleration. Each network was trained for $3 \times 10^{5}$ backpropagations, which took one week to train per network on a GeForce TITAN X, however, our manual inspection of the loss curve indicates that the training error plateaued at much early stage, approximately within 3 days.

For DLMRI, we used the implementation from 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. with patch size $6 \times 6$ . Since DLMRI is quite time consuming, in order to obtain the results within a reasonable amount of time, we trained a joint dictionary for all time frames within the subject and reconstructed them in parallel. Note that we did not observe any decrease in performance from this approach. For each subject, we ran 400 iterations and obtained the final reconstruction.

b) Results:

b) 结果:

The means of the reconstruction errors across 10 subjects are summarised in Table. I. For both 3-fold and 6-fold acceleration, one can see that CNN consistently outperformed DLMRI, and that the standard deviation of the error made by CNN was smaller. The reconstructions from 6-fold acceleration is in Fig. 9. Although both methods suffered from significant loss of structures, the CNN was still capable of better preserving the texture than DLMRI (highlighted in red ellipse). On the other hand, DLMRI created block-like artefacts due to over-smoothing. 6 $\times$ undersampling for these images typically approaches the limit of sparsity-based methods, however, the CNN was able to predict some anatomical details which was not possible by DLMRI. This could be due to the fact that the CNNs has more free parameters to tune with, allowing the network to learn complex but more accurate end-to-end transformations of data.

TABLE I

The Result of 2D Reconstruction. DLMRI vs. CNN Across 10 Scans

二维重建的结果。DLMRI对比CNN，横跨10次扫描。

Fig. 9.

The comparison of 2D reconstructions from DLMRI and CNN for test data. (a) The original (b) 6 $\times$ undersampled (c,d) CNN reconstruction and its error map (e,f) DLMRI reconstruction and its error map. There are larger errors in (f) than (d) and red ellipse highlights the anatomy that was reconstructed by CNN better than DLMRI.

在测试集上 DLMRI 和 CNN 二维重建的对比。 (a) 原始图像 (b) 6 $\times$ 降采样 (c,d) CNN重建和它的误差图 (e,f) DLMRI重建和它的误差图。相比 (d) ，(f) 有更大的误差，而红色椭圆标出的解剖结构也是CNN重建地比DLMRI更好。

c) Comparison of reconstruction speed:

c) 重建速度的对比:

While training the CNN is time consuming, once it is trained, the inference can be done extremely quickly on a GPU. Reconstructing each slice took 23 ± 0.1 milliseconds on a GeForce GTX 1080, which enables real-time applications. To produce the above results, DLMRI took about 6.1 ± 1.3 hours per subject on CPU. Even though we do not have a GPU implementation of DLMRI, it is expected to take longer than 23ms because DLMRI requires dozens of iterations of dictionary learning and sparse coding steps. Using a fixed, pre-trained dictionary could remove this bottleneck in computation although this would likely be to the detriment of reconstruction quality.

C. 3D Experiments

C. 3D实验

For the following experiments, we split our dataset into training and testing sets containing seven and three subjects respectively. Compared to the 2D case, we have significantly less data. As aforementioned, we applied elastic deformations in addition to rigid transformation to augment the training data input in order to increase the variation of the examples seen by the network. Furthermore, working with a large input is a burden on memory, limiting the size of the network that can be used. To address this, we trained our model on an input size $256\times N_{patch} \times 30$ , where the direction of patch extraction corresponds to the frequency-encoding direction. In this way, we can train the network with the same aliasing patterns while reducing the input size. Note that the extracted patches of an image sequence will have different $k$ -space values compared to the original data once the field-of-view (FOV) is reduced. As such, this trick only works for training where the patches can be treated as the new instances of training data. In particular at test time, since only the raw data with full FOV is available, the CNN must also be applied to the entire volume in order to perform data consistency step correctly.

1) Effect of Data Sharing:

1) 数据共享的作用:

In this experiment, we evaluated the effect of using the features obtained from data sharing. We trained the following two networks: D5-C10(S) ( $n_{d}=5$ , $n_{c}=10$ with data sharing) and D6-C10 ( $n_{d}=6$ , $n_{c}=10$ without data sharing). In the second network, the data sharing is replaced by an additional convolution layer to account for the additional input. We trained each model to reconstruct the sequences from 9-fold undersampling for $2.5 \times 10^{4}$ backpropagations. Their learning is plotted in Fig. 10. We can notice that there is a considerable difference in their errors. The error of the D5-C10(S) was smaller for both train and test, suggesting that it was able to learn a strategy to de-alias image that generalises better. Moreover, by using data sharing, the network was able to learn faster. The visualization of their reconstructions can be found in the following section.

Fig. 10.

The effect of data sharing. The network with data sharing shows superior performance over the other. In particular, it has considerably lower test error, showing an improved generalization property.

数据共享层的效果。有数据分享层的网络展现出相比其他更好的性能。特别地，它的测试误差大幅降低，展现了更好的泛化特性。

2) Comparison With State-of-the-Art:

2) 与其他先进成果的比较:

In this experiment, we compared our model with state-of-the-art methods: DLTG 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. , kt-SLR 20. S. G. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic MRI exploiting sparsity and low-rank structure: $K-t$ SLR,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1042–1054, May 2011. and L+S 21. R. Otazo, E. Candes, and D. K. Sodickson, “Low-rank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components,” Magn, Reson. Med., vol. 73, no. 3, pp. 1125–1136, 2015. for reconstructing the dynamic sequence. We compared the results for 3, 6, 9 and 11-fold acceleration factors.