Abstract

摘要

Inspired by recent advances in deep learning, we propose a framework for reconstructing dynamic sequences of 2-D cardiac magnetic resonance (MR) images from undersampled data using a deep cascade of convolutional neural networks (CNNs) to accelerate the data acquisition process. In particular, we address the case where data are acquired using aggressive Cartesian undersampling. First, we show that when each 2-D image frame is reconstructed independently, the proposed method outperforms state-of-the-art 2-D compressed sensing approaches, such as dictionary learning-based MR image reconstruction, in terms of reconstruction error and reconstruction speed. Second, when reconstructing the frames of the sequences jointly, we demonstrate that CNNs can learn spatio-temporal correlations efficiently by combining convolution and data sharing approaches. We show that the proposed method consistently outperforms state-of-the-art methods and is capable of preserving anatomical structure more faithfully up to 11-fold undersampling. Moreover, reconstruction is very fast: each complete dynamic sequence can be reconstructed in less than 10 s and, for the 2-D case, each image frame can be reconstructed in 23 ms, enabling real-time applications.

受近期深度学习领域进展的启发,我们提出了一种利用深度级联的神经网络来实现从降采样数据中重建动态二维笛卡尔磁共振序列图像的框架,用以加速数据采集的过程。尤其是我们实现了重建经高度笛卡尔降采样数据的案例。首先,我们展示了在每一个二维图像都被独立重建的情况下,本文所提出的方法都在重建误差和重建速度方面优于最新的二维压缩感知方法(如基于字典学习的图像重建)。其次,当序列中的帧一起进行重建时,我们证明了卷积神经网络能够以结合卷积和数据共享方法的方式来有效地学习时空相关性。我们呢表明,本文所提出的方法始终优于最近的技术,并且可以在高达11倍降采样的情况下准确地保留解剖结构。此外,重建还非常地快速:每个完整的动态序列能在10s内完成重建,而且在二维的情况下,每一个图像帧能够在23ms内完成重建,从而使实时的应用成为可能。

Introduction

背景介绍

In many clinical scenarios, medical imaging is an indispensable diagnostic and research tool. One such important modality is Magnetic Resonance Imaging (MRI), which is non-invasive and offers excellent resolution with various contrast mechanisms to reveal different properties of the underlying anatomy. However, MRI is associated with an inherently slow acquisition process. This is because data samples of an MR image are acquired sequentially in k-space and the speed at which k -space can be traversed is limited by physiological and hardware constraints 1. M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressed sensing MRI,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 72–82, Mar. 2008. . A long data acquisition procedure imposes significant demands on patients, making this imaging modality expensive and less accessible. One possible approach to accelerate the acquisition process is to undersample k-space, which in theory provides an acceleration rate proportional to a reduction factor of a number of k-space traversals required. However, undersampling in k-space violates the Nyquist-Shannon theorem and generates aliasing artefacts when the image is reconstructed. The main challenge in this case is to find an algorithm that can recover an uncorrupted image taking into account the undersampling regime combined with a-priori knowledge of appropriate properties of the image to be reconstructed.

在很多临床场景中,医学影像是一种不可或缺的诊断和研究工具。其中一个重要的模态是MRI(磁共振影像),它是非侵入性的,并且能在多种对比机制下提供出色的分辨率,从而可以揭示基础解剖结构的不同性质。然而,MRI总是与固有的缓慢采集过程联系在一起。这是因为磁共振图像数据的采样是在k-空间中依次进行的,而遍历k-空间的速度受到物理和硬件层面的限制1. M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressed sensing MRI,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 72–82, Mar. 2008. 。 这样长时间的采集过程对患者提出了很高的要求,从而使这种成像模态变得昂贵且难以利用。一种可能的加速采集过程的方法是降采样k-空间,它在理论上能获得与减少k-空间中遍历的系数成比例的加速倍数。然而,在k-空间中的降采样违反了奈奎斯特-香农定律,从而会在重建时产生混叠伪影。这时,主要的挑战就是去寻找一种算法,考虑降采样的机制,结合欲重建图像特性的先验知识,用以还原完好的图像。

Using Compressed Sensing (CS), images can be reconstructed from sub-Nyquist sampling, assuming the following: firstly, the images must be compressible, i.e. they have a sparse representation in some transform domain. Secondly, one must ensure incoherence between the sampling and sparsity domains to guarantee that the reconstruction problem has a unique solution and that this solution is attainable. In practice, this can be achieved with random subsampling of k-space, which produces aliasing patterns in the image domain that can be regarded as correlated noise. Under such assumptions, images can then be reconstructed through nonlinear optimisation or iterative algorithms. The class of methods which apply CS to the MR reconstruction problem is termed CS-MRI 1. M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressed sensing MRI,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 72–82, Mar. 2008. . In general, these methods use a fixed sparsifying transforms, e.g. wavelet transformations. A natural extension of these approaches has been to enable more flexible representations with adaptive sparse modelling, where one attempts to learn the optimal sparse representation from the data directly. This can be done by exploiting, for example, dictionary learning (DL) 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. .

运用压缩感知(CS)技术,图像之所以能从低于奈奎斯特要求的采样中重建,是基于以下假设:首先,图像必须是可压缩的(例如在某些转换域中具有稀疏表示)。其次,采样域和稀疏域之间需要确保是不相干的,从而保证重建问题具有独特的解并且解是可以求得的。在实践中,这可以用k-空间中随机的降采样来实现,这种采样方式在图像域中产生的混叠可以视为相关的噪声。在这样的假设下,图像可以通过非线性优化或迭代算法进行重建。这一类将压缩感知应用到磁共振重建问题的方法被称为CS-MRI1. M. Lustig, D. L. Donoho, J. M. Santos, and J. M. Pauly, “Compressed sensing MRI,” IEEE Signal Process. Mag., vol. 25, no. 2, pp. 72–82, Mar. 2008. 。通常来说,这类方法使用一个固定的稀疏变换(如小波变换)。这些方法的一个自然延伸是通过自适应的稀疏模型来实现更加灵活的稀疏表示,其中一种思路就是直接从数据中学习最佳的稀疏表示。这可以通过利用如字典学习(DL)2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. 的方法来实现。

To achieve more aggressive undersampling, several strategies can be considered. One way is to further exploit the inherent redundancy of the MR data. For example, in dynamic imaging, one can make use of spatio-temporal redundancies 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. 4. H. Jung, J. C. Ye, and E. Y. Kim, “Improved kt BLAST and kt SENSE using FOCUSS,” Phys. Med. Biol., vol. 52, no. 11, pp. 3201–3226, Jun. 2007. 5. T. M. Quan and W.-K. Jeong, “Compressed sensing reconstruction of dynamic contrast enhanced MRI using GPU-accelerated convolutional sparse coding,” in Proc. IEEE 13th Int. Symp. Biomed. Imag., Prague, Czech Republic, Apr. 2016, pp. 518–521. , or when imaging a full 3D volume, one can exploit redundancy from adjacent slices 6. A. Hirabayashi, N. Inamuro, K. Mimura, T. Kurihara, and T. Homma, “Compressed sensing MRI using sparsity induced from adjacent slice similarity,” in Proc. Int. Conf. Sampling Theory Appl. (SampTA), Washington, DC, USA, May 2015, pp. 287–291. . An alternative approach is to exploit sources of explicit redundancy of the data to turn the initially underdetermined problem arising from undersampling into a determined or overdetermined problem that is easily solved. This is the fundamental assumption underlying parallel imaging 7. M. Uecker, “ESPIRiT—An eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA,” Magn. Reson. Med., vol. 71, no. 3, pp. 990–1001, Mar. 2014. . Similarly, one can make use of multi-contrast information 8. J. Huang, C. Chen, and L. Axel, “Fast multi-contrast MRI reconstruction,” Magn. Reson. Imag., vol. 32, no. 10, pp. 1344–1352, Dec. 2014. or the redundancy generated by multiple filter responses of the image 9. X. Peng and D. Liang, “MR image reconstruction with convolutional characteristic constraint (CoCCo),” IEEE Signal Process. Lett., vol. 22, no. 8, pp. 1184–1188, Aug. 2015. . These explicit redundancies can also be used to complement the sparse modelling of inherent redundancies 10. K. H. Jin, D. Lee, and J. C. Ye, “A novel k-space annihilating filter method for unification between compressed sensing and parallel MRI,” in Proc. IEEE Int. Symp. Biomed. Imag., New York, NY, USA, Apr. 2015, pp. 327–330. , 11. D. Liang, B. Liu, J. Wang, and L. Ying, “Accelerating SENSE using compressed sensing,” Magn. Reson. Med., vol. 62, no. 6, pp. 1574–1584, Dec. 2009. .

为了实现更高幅度的降采样,可以考虑几种策略。一种方法是进一步探索磁共振数据内在的冗余性。例如,在动态成像中,人们可以利用空间-时间层面的冗余性 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. -4. H. Jung, J. C. Ye, and E. Y. Kim, “Improved kt BLAST and kt SENSE using FOCUSS,” Phys. Med. Biol., vol. 52, no. 11, pp. 3201–3226, Jun. 2007. 5. T. M. Quan and W.-K. Jeong, “Compressed sensing reconstruction of dynamic contrast enhanced MRI using GPU-accelerated convolutional sparse coding,” in Proc. IEEE 13th Int. Symp. Biomed. Imag., Prague, Czech Republic, Apr. 2016, pp. 518–521. 。又或者当成像一个完整的三维物体时,人们可以利用相邻层之间的冗余性6. A. Hirabayashi, N. Inamuro, K. Mimura, T. Kurihara, and T. Homma, “Compressed sensing MRI using sparsity induced from adjacent slice similarity,” in Proc. Int. Conf. Sampling Theory Appl. (SampTA), Washington, DC, USA, May 2015, pp. 287–291. 。另一种可供替代的方法是利用数据源头中的显性的冗余性,把原本降采样提出的欠定问题变成容易解决的正定乃至过定问题。这是并行成像7. M. Uecker, “ESPIRiT—An eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA,” Magn. Reson. Med., vol. 71, no. 3, pp. 990–1001, Mar. 2014. 的基本假设。类似地,人们也可以利用多对比度的信息8. J. Huang, C. Chen, and L. Axel, “Fast multi-contrast MRI reconstruction,” Magn. Reson. Imag., vol. 32, no. 10, pp. 1344–1352, Dec. 2014. 或者同一个图像的多个滤波器响应产生的冗余9. X. Peng and D. Liang, “MR image reconstruction with convolutional characteristic constraint (CoCCo),” IEEE Signal Process. Lett., vol. 22, no. 8, pp. 1184–1188, Aug. 2015. 。这些显性的冗余性也可以用来补充隐形冗余性的稀疏模型10. K. H. Jin, D. Lee, and J. C. Ye, “A novel k-space annihilating filter method for unification between compressed sensing and parallel MRI,” in Proc. IEEE Int. Symp. Biomed. Imag., New York, NY, USA, Apr. 2015, pp. 327–330. 11. D. Liang, B. Liu, J. Wang, and L. Ying, “Accelerating SENSE using compressed sensing,” Magn. Reson. Med., vol. 62, no. 6, pp. 1574–1584, Dec. 2009.

Recently, deep learning has been successful at tackling many computer vision problems. Deep neural network architectures, in particular convolutional neural networks (CNNs), are becoming the state-of-the-art technique for various imaging problems including image classification 12. K. He, X. Zhang, S. Ren, and J. Sun. (Dec. 2015). “Deep residual learning for image recognition.” [Online]. Available: https://arxiv.org/abs/1512.03385 , object localisation 13. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. NIPS, Montreal, QC, Canada, 2015, pp. 91–99. and image segmentation 14. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. MICCAI, Munich, Germany, 2015, pp. 234–241. . Deep architectures are capable of extracting features from data to build increasingly abstract representations, replacing the traditional approach of carefully hand-crafting features and algorithms. For example, it has already been demonstrated that CNNs outperform sparsity-based methods in super-resolution 15. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, Feb. 2016. in terms of both reconstruction quality and speed 16. W. Shi, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf. CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 1874–1883. . One of the contributions of our work is to explore the application of CNNs in undersampled MR reconstruction and investigate whether they can exploit data redundancy through learned representations. In fact, CNNs have already been applied to compressed sensing from random Gaussian measurements 17. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “ReconNet: Non-iterative reconstruction of images from compressively sensed measurements,” in Proc. IEEE Conf. CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 449–458. . Despite the popularity of CNNs, there has only been preliminary research on CNN-based MR image reconstruction 18. Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-Net for compressive sensing MRI,” in Proc. NIPS Barcelona, Spain, 2016, pp. 10–18. , 19. S. Wang, “Accelerating magnetic resonance imaging via deep learning,” in Proc. IEEE 13th Int. Symp. Biomed. Imag., Prague, Czech Republic, Apr. 2016, pp. 514–517. , hence the applicability of CNNs to this problem for various imaging protocols has yet to be fully explored.

近来,深度学习已经成功地处理了许多计算机视觉问题。深度神经网络的架构,尤其是卷积神经网络(CNN),成为了解决如图像分类12. K. He, X. Zhang, S. Ren, and J. Sun. (Dec. 2015). “Deep residual learning for image recognition.” [Online]. Available: https://arxiv.org/abs/1512.03385 、物体定位13. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. NIPS, Montreal, QC, Canada, 2015, pp. 91–99. 以及图像分割14. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. MICCAI, Munich, Germany, 2015, pp. 234–241. 等问题的最先进的技术。深层的架构能够从数据中提取特征以构建抽象层次不断提升的表示,取代了传统手工设定特征和算法的方法。举例来说,目前已证明CNN在超分辨率问题中已经从重建质量15. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 2, pp. 295–307, Feb. 2016. 和速度16. W. Shi, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf. CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 1874–1883. 方面超越了基于稀疏的方法。而我们工作的一个贡献就是去探索CNN在降采样磁共振重建中的应用,并且探索其能否通过学习得到的表示来发掘数据的冗余性。事实上,CNN已经被运用于随机高斯采样的压缩感知17. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “ReconNet: Non-iterative reconstruction of images from compressively sensed measurements,” in Proc. IEEE Conf. CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 449–458. 。尽管CNN已经颇为热门,现在对基于CNN的磁共振图像重建还仅有初步的研究18. Y. Yang, J. Sun, H. Li, and Z. Xu, “Deep ADMM-Net for compressive sensing MRI,” in Proc. NIPS Barcelona, Spain, 2016, pp. 10–18. 19. S. Wang, “Accelerating magnetic resonance imaging via deep learning,” in Proc. IEEE 13th Int. Symp. Biomed. Imag., Prague, Czech Republic, Apr. 2016, pp. 514–517. ,因此在各种扫描协议下CNN处理该问题的适用性还没有被充分地探索。

In this work we consider reconstructing dynamic sequences of 2D cardiac MR images with Cartesian undersampling, as well as reconstructing each frame independently, using CNNs. We view the reconstruction problem as a de-aliasing problem in the image domain. Reconstruction of undersampled MR images is challenging because the images typically have low signal-to-noise ratio, yet often high-quality reconstructions are needed for clinical applications. To resolve this issue, we propose a deep network architecture which forms a cascade of CNNs.Code available at https://github.com/js3611/Deep-MRI-Reconstruction Our cascade network closely resembles the iterative reconstruction of DL-based methods, however, our approach allows end-to-end optimization of the reconstruction algorithm. For 2D reconstruction, the proposed method is compared to Dictionary Learning MRI (DLMRI) 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. and for dynamic reconstruction, the method is compared to Dictionary Learning with Temporal Gradient (DLTG ) 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. , kt Sparse and Low-Rank (kt-SLR) 20. S. G. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic MRI exploiting sparsity and low-rank structure: Kt SLR,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1042–1054, May 2011. and Low-Rank Plus Sparse Matrix Decomposition (L + S) 21. R. Otazo, E. Candes, and D. K. Sodickson, “Low-rank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components,” Magn, Reson. Med., vol. 73, no. 3, pp. 1125–1136, 2015. , which are the state-of-the-art compressed sensing and low-rank approaches. We show that the proposed method outperforms them in terms of reconstruction error and perceptual quality, especially for aggressive undersampling rates. Moreover, owing to GPU-accelerated libraries, images can be reconstructed efficiently using the approach. In particular, for 2D reconstruction, each image can be reconstructed in about 23ms, which is fast enough to enable real-time applications. For the dynamic case, sequences can be reconstructed within 10s, which is reasonably fast for off-line reconstruction methods.

在这项工作中,我们考虑重建笛卡尔降采样的心脏二维动态磁共振成像序列,以及每一帧都用CNN进行独立的重建。我们把重建问题看成是一个图像域的去混叠问题。重建降采样的磁共振图像之所以是一个挑战是因为图像通常有着较低的信噪比,而临床应用却需要高质量的重建。为了解决,我们提出了一个由级联的CNN所组成的深度网络架构。代码可以在 https://github.com/js3611/Deep-MRI-Reconstruction 找到我们的级联网络非常类似基于字典学习的迭代重建方法,但是我们的方法能对重建算法进行端到端的优化。对于二维重建,我们提出的方法与字典学习MRI(DLMRI)2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. 进行比较。对于动态重建,则与具有时间梯度的字典学习(DLTG3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. 、kt稀疏和低秩 (kt-SLR) 以及低秩加稀疏矩阵分解 (L + S) 技术进行比较。这些技术是最先进的压缩感知和低秩方法。我们表明本文所提出的方法就重建误差和可感知质量而言超过了他们,尤其是在较高降采样率下。而且由于GPU加速库的存在,图像能够有效地以这种方式重建。尤其是对于二维重建,每张图能在约23ms的时间内得到重建,这样快的速度已经足以实现实时的应用。对动态的情况而言,图像序列能在10s内得到重建,这对离线重建方法而言也是足够快的。

Problem Formulation

问题形成

Let xCN represent a sequence of 2D complex-valued MR images stacked as a column vector, where N=NxNyNt . Our problem is to reconstruct x from yCM (MN ), undersampled measurements in k-space, such that:

xCN 表示叠放成列向量的二维复数值磁共振图像序列,其中 N=NxNyNt 。 我们的问题是从k-空间的降采样测量值 yCM (MN ) 中重建 x ,这样就有:

(1)y=Fux+e

Here FuCM×N is an undersampled Fourier encoding matrix and eCM is acquisition noise modelled as additive white Gaussian (AWG) noise. In the case of Cartesian acquisition, we have Fu=MF , where FCN×N applies two-dimensional Discrete Fourier Transform (DFT) to each frame in the sequence and MCM×N is an undersampling mask selecting lines in k-space to be sampled for each frame. The corresponding subset of indices sampled in k-space is indicated by Ω . For the fully-sampled case, M=N , the sequence is reconstructed by applying the 2D inverse DFT (IDFT) to each frame. However, Eq. (1) is underdetermined even in the absence of noise, and hence the inversion is ill-posed; in particular, applying IDFT, which in this case is also called zero-filled reconstruction, results in a sequence of aliased images xu=FuHy due to sub-Nyquist sampling. Note that FuH is the Hermitian of the encoding matrix, which first maps yCM to the k-t coordinate and then applies the 2D IDFT frame-wise. Examples of the aliased images are shown in Fig. 1. Therefore, in order to reconstruct x , one must exploit a-priori knowledge of its properties, which can be done by formulating an unconstrained optimisation problem:

其中 FuCM×N 是降采样傅里叶编码矩阵 ,而 eCM 是可以建模为加性高斯白噪声(AWG)的采集噪声。在笛卡尔采样的情况下,我们有 Fu=MF,其中 FCN×N 对序列中的每一帧进行离散傅里叶变换(DFT),而 MCM×N 是在每一帧中选择 k-空间采样线的降采样掩模。相应地,在 k-空间中被采集的子集索引则用 Ω 表示。在全采样的情况下,M=N ,整个序列可以通过对每一帧进行二维 离散逆傅里叶变换(IDFT)得到重建。然而,公式 (1) 即便在没有噪声的情况下也是欠定的,所以反变换是不适定的;特别地,如果直接进行逆傅里叶变换,这种情况被称为填零重建, 其结果是得到一系列由于低于奈奎斯特条件而形成的混叠图像 xu=FuHy 。 注意 FuH是编码矩阵的伴随,它首先在 k-t  轴上映射 yCM 然后逐帧进行二维IDFT。混叠的图像如 Fig. 1. 所示。因此,为了重建 x ,人们必须利用其性质的先验知识,而这可以通过构造一个非约束优化问题来实现:

(2)min.x R(x)+λyFux22

Fig. 1.

An example of the image acquisition with Cartesian undersampling for a sequence of cardiac cine images. (a) A ground truth sequence that is fully-sampled in k-space, shown along x -y and y -t for the image frame and the temporal profile respectively. (b) A Cartesian undersampling mask that only acquires 1/12 of samples in k-space, where white indicates the sampled lines. Each image frame is undersampled with the mask shown along kx -ky . The undersampling pattern along the temporal dimension is shown in ky -t . (c) The zero-filled reconstruction of the image acquired using the 12-fold undersampling mask. (d, e) 4-fold Cartesian undersampling mask and the resulting zero-filled image. Note that the aliasing artefact becomes more prominent as the undersampling factor is increased.

一个笛卡尔降采样的心脏动态成像采集案例。 (a) 在k-空间中全采样的基准真相,分别沿着 x-yy-t 展示图像帧和时间维的轮廓。 (b) 一个在k-空间中只采集1/12采样点笛卡尔降采样掩模,其中白色是采样到的采样线。每一帧图像都用展示的掩模沿着kx -ky进行降采样。沿着时间维的采样模式以 ky-t的形式展现。 (c) 用12倍降采样掩模采集后填零重建的图像。 (d, e) 四倍笛卡尔降采样掩模和其产生的填零重建图像。注意到随着降采样倍数的增加,混叠伪影也变得更明显。

R expresses regularisation terms on x and λR allows the adjustment of data fidelity based on the noise level of the acquired measurements y . For CS-based methods, the regularisation terms R typically involve 0 or 1 norms in the sparsifying domain of x . Our formulation is inspired by DL-based reconstruction approaches 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. , in which the problem is formulated as:

R 表示 x 上的正则化项,而 λR 则可以基于取得的测量结果 y 来调节图像的保真度。对于基于压缩感知的方法,正则化项 R 通常是在x 的稀疏域中的 01 范数。我们的公式受到基于字典学习方法2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. 的启发,其中原问题可以用以下公式表示:

(3)min.x,D,{γi} i(RixDγi22+νγi0)+λyFux22

Here Ri is an operator which extracts a spatio-temporal image patch at i , γi is the corresponding sparse code with respect to a dictionary D . In this approach, the regularisation terms force x to be approximated by the reconstructions from the sparse code of patches. By taking the same approach, for our CNN formulation, we force x to be well-approximated by the CNN reconstruction:

在这里 Ri 是在 i 位置提取时-空图像小块的算子,γi 是对应的基于字典 D 形成的稀疏编码。在这种方法中,正则化项能让 x 近似为一系列小块的稀疏表示的重建。采用同样的方法,在我们的CNN公式中, 同样也让 x 成为CNN重建的良好近似:

(4)min.x xfcnn(xu|θ)22+λFuxy22

Here fcnn is the forward mapping of the CNN parameterised by θ , possibly containing millions of adjustable network weights, which takes in the zero-filled reconstruction xu and directly produces a reconstruction as an output. Since xu is heavily affected by aliasing from sub-Nyquist sampling, the CNN reconstruction can therefore be seen as solving a de-aliasing problem in the image domain. The approach of Eq. (4), however, is limited in the sense that the CNN reconstruction and the data fidelity are two independent terms. In particular, since the CNN operates in the image domain, it is trained to reconstruct the sequence without a-priori information of the acquired data in k -space. However, if we already know some of the k -space values, then the CNN should be discouraged from modifying them, up to the level of acquisition noise. Therefore, by incorporating the data fidelity in the learning stage, the CNN should be able to achieve better reconstruction. This means that the output of the CNN is now conditioned on Ω and λ . Then, our final reconstruction is given simply by the output, xcnn=fcnn(xu|θ,λ,Ω) . Given training data D of input-target pairs (xu,xgnd) where xgnd is a fully-sampled ground-truth data, we can train the CNN to produce an output that attempts to accurately reconstruct the data by minimising an objective function:

这里 fcnn 是由包含了可达数百万个可调节网络权重的 θ 参数化的CNN前向映射。它把填零的重建 xu 作为输入并且直接输出一个重建结果。既然 xu 是严重地受采样率不足混叠影响的,CNN重建就可以视为是一个图像域的去混叠问题。 然而,公式 (4) 受限于CNN重建和数据保真是两个独立的项。特别是,既然CNN是图像域中的操作,它是在没有采集数据的k-空间先验知识的情况下被训练进行重建的。然而,如果我们已经知道了部分 k-空间的值,那CNN应该不被鼓励去在超过采集噪声的幅度上修改他们。因此,通过在学习阶段合并数据保真,CNN应该可以实现更好的重建。这意味着现在CNN的输出应该取决于 Ωλ。进而我们最终的重建可以简单地由输出 xcnn=fcnn(xu|θ,λ,Ω) 给出。给定由输入-目标对 (xu,xgnd) (其中 xgnd 是全采样的基准真相)组成的训练数据 D,我们能通过最小化下列目标函数来训练CNN产生尽可能准确的重建图像:

(5)L(θ)=(xu,xgnd)D(xgnd,xcnn)

where is a loss function. In this work, we consider an element-wise squared loss, which is given by (xgnd,xcnn)=xgndxcnn22 .

其中  是损失函数。在本工作中我们采用逐元素的方差,可以表示为 (xgnd,xcnn)=xgndxcnn22

Data Consistency Layer

数据一致化层

Denote the Fourier encoding of the image reconstructed by CNN as scnn=Fxcnn=Ffcnn(xu|θ) . scnn(j) represents an entry at index j in k -space. The undersampled data yCM can be mapped onto the vectorised representation of k -t coordinate (CN ) by s0=FFuHy , which fills the non-acquired indices in k -space with zeros. In order to incorporate the data fidelity in the network architecture, we first note the following: for fixed network parameters θ , Eq. (4) has a closed-form solution srec in k -space, given as in 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. element-wise:

scnn=Fxcnn=Ffcnn(xu|θ) 表示CNN重建图像的傅里叶编码。 scnn(j) 表示在k-空间内位于索引 j 的条目。降采样数据 yCM 能通过 s0=FFuHy 的方式映射成为 k -t 坐标  (CN ) 下的在未采样点填零的向量表示。 为了将数据保真纳入网络架构,我们首先注意到以下这点:对给定的网络参数 θ ,方程 (4) 有一个在k-空间的近似解 srec,在 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. 中以逐元素的方式给出:

(6)srec(j)={scnn(j)if jΩscnn(j)+λs0(j)1+λif jΩ

The final reconstruction in the image domain is then obtained by applying the inverse Fourier encoding xrec=FHsrec . The solution yields a simple interpretation: if the k -space coefficient srec(j) is initially unknown (i.e. jΩ ), then we use the predicted value from the CNN. For the entries that have already been sampled (jΩ ), we take a linear combination between the CNN prediction and the original measurement, weighted by the level of noise present in s0 . In the limit λ we simply replace the j -th predicted coefficient in Ω by the original coefficient. For this reason, this operation is called a data consistency step in k -space (DC). In the case of where there is non-neglegible noise present in the acquisition, λ=q/σ must be adjusted accordingly, where q is a hyper-parameter and σ2 is the power of AWG noise in k -space (i.e. (ei),(ei)N(0,σ/2) ). In 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. , it is empirically shown that p[5×105,5×106] for σ2[4×108,109] works sufficiently well.

那么最终图像域的重建结果能通过再加一个逆傅里叶变换 xrec=FHsrec 的方式得到。这种解法有一个简单的解释:如果k-空间系数 srec(j) 一开始是未知的(即 jΩ )那么我们就采用来自CNN的预测值。对于已经采样的条目(jΩ ),我们用以s0中展现的噪声幅度为权重线性结合原始采集和CNN的预测。当取极限 λ 时,其实就是简单地把在 Ω 中第 j 位的预测系数替换成原始的系数。因此,这一操作被称为k-空间中的数据一致性(DC)手段。在采样过程噪声不可忽略的情况下,λ=q/σ必须根据情况做相应的调整,其中 q 是一个超参数 ,而 σ2k-空间中 AWG 噪声的功率(即 (ei),(ei)N(0,σ/2) )。在 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. 中经验性地表明 p[5×105,5×106] 对于 σ2[4×108,109] 的情况能有良好的结果。

Since the DC step has a simple expression, we can in fact treat it as a layer operation of the network, which we denote as a DC layer. When defining a layer of a network, the rules for forward and backward passes must be specified in order for the network to be end-to-end trainable. This is because CNN training can effectively be performed through stochastic gradient descent, where one updates the network parameters θ to minimise the objective function L by descending along the direction given by the derivative L/θT . Therefore, it is necessary to define the gradients of each network layer relative to the network’s output. In practice, one uses an efficient algorithm called backpropagation 22. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, Oct. 1986. , where the final gradient is given by the product of all the Jacobians of the layers contributing to the output. Hence, in general, it suffices to specify a layer operation fL for the forward pass and derive the Jacobian of the layer with respect to the layer input fL/xT for the backward pass.

既然DC步有一个简单的形式,那么我们实际上可以把它当作网络中的层操作,称作DC层。定义网络中的层时,必须指定向前和向后传递的规则,从而让网络能被端到端地训练。这是因为CNN的训练能有效地通过随机梯度下降来推进。这种方法通过沿着梯度L/θT 的方向优化目标函数 L 来更新网络参数 θ。因此,有必要定义每一个与网络输出相关的层的梯度。在实践中,人们使用一种被称为反向传播 22. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, Oct. 1986. 的有效算法,它最终的梯度由所有对输出有贡献的层的雅可比行列式的乘积给出。 因此,给前向传递指定一个层算子 fL ,同时给反向传播导出一个以层输入为自变量的层雅可比矩阵 fL/xT 就足够了。

A. Forward Pass

A. 正向传递

The data consistency in k -space can be simply decomposed into three operations: Fourier transform F , data consistency fdc and inverse Fourier transform FH . The data consistency fdc performs the element-wise operation defined in Eq. (6), which, assuming s0(j)=0jΩ , can be written in matrix form as:

k-空间中的数据一致性能被简单地分为三个算子:傅里叶变换 F,数据一致化 fdc 以及逆傅里叶变换 FH。数据一致化 fdc 执行公式 (6) 中定义的逐元素操作。如果假设 s0(j)=0jΩ,那就能被写成像这样的矩阵形式:

(7)fdc(s,s0;λ)=Λs+λ1+λs0

Here Λ is a diagonal matrix of the form:

这里 Λ 是一个对角阵:

(8)Λkk={1if jΩ11+λif jΩ

Combining the three operations defined above, we can obtain the forward pass of the layer performing data consistency in k-space:

结合以上三个算子,我们能得到k-空间一致化层的前向传递公式:

(9)fL(x,y;λ)=FHΛFx+λ1+λFuHy

B. Backward Pass

B. 反向传递

In general, one requires Wirtinger calculus to derive a gradient in complex domain 23. A. Faijul and K. Murase, “Learning algorithms in complex-valued neural networks using Wirtinger calculus,” in Complex-Valued Neural Networks: Advances and Applications. Hoboken, NJ, USA: Wiley, 2013, pp. 550–559. . However, in our case, the derivation greatly simplifies due to the linearity of the DFT matrix and the data consistency operation. The Jacobian of the DC layer with respect to the layer input x is therefore given by:

通常来说,我们需要Wirtinger微积分来推导复数域中的梯度23. A. Faijul and K. Murase, “Learning algorithms in complex-valued neural networks using Wirtinger calculus,” in Complex-Valued Neural Networks: Advances and Applications. Hoboken, NJ, USA: Wiley, 2013, pp. 550–559. 。然而,对于我们的情况,由于DFT矩阵和数据一致性操作都是线性的,推导得以大大简化。从而以层输入x为自变量的DC层雅可比矩阵为:

(10)fLxT=FHΛF

Note that unlike many other applications where CNNs process real-valued data, MR images are complex-valued and the network needs to account for this. One possibility would be to design the network to perform complex-valued operations. A simpler approach, however, is to accommodate the complex nature of the data with real-valued operations in a dimensional space twice as large (i.e. we replace CN by R2N ). In the latter case, the derivations above still hold due to the fundamental assumption in Wirtinger calculus.

值得注意的是,与许多其他CNN处理实数数据的应用不同,磁共振图像是复数值所以网络必须考虑到这一情况。一种可能是设计一个能进行复数操作的网络。而一种更简单的方式是用两倍大维度的实数空间来承接复数数据(即我们用 R2N 来取代 CN )。在后一种情况下,由于Wirtinger微积分中的基本假设,上述推导依然成立。

The DC layer has one hyperparameter λR . This value can be fixed or made trainable. In the latter case, the derivative fdcλ (a column vector here) is given by:

在DC层中有一个超参数λR 。这个值可以是预先给定的,也可以是训练出的。对于后者,偏微分 fdcλ(在这里是一个列向量)可以表示为:

(11)[fdc(s,s0;λ)λ]j={0if jΩs0(j)scnn(j)(1+λ)2if jΩ

and the update is Δλ=Jefdcλ where Je is the error backpropagated via the Jacobians of the layers proceeding fdc .

而更新的增量则为Δλ=Jefdcλ ,其中 Je 是通过前向层 fdc 的雅可比矩阵反向传播的误差。

Cascading Network

级联网络

For CS-based methods, in particular for DL-based methods, the optimisation problem such as in Eq. (3) is solved using a coordinate-descent type algorithm, alternating between the de-aliasing step and the data consistency step until convergence. In contrast, with CNNs, we are performing one step de-aliasing and the same network cannot be used to de-alias iteratively. While CNNs may be powerful enough to learn one step reconstruction, such a network could show signs of overfitting, unless there is vast amounts of training data. In addition, training such networks may require a long time as well as careful fine-tuning steps. It is therefore best to be able to use CNNs for iterative reconstruction approaches.

对基于压缩感知的方法,尤其是基于字典学习的方法,形如方程 (3) 的优化问题是用坐标下降类的算法来求解的,并在去混叠和数据一致化步之间轮流进行直至收敛。相比之下,对于CNN,我们执行一步到位去混叠,而且相同的网络无法被迭代地用于去混叠。尽管CNN可能足以实现一步重建,但除非有大量的训练数据,否则这样的网络可能会表现出过拟合的迹象。而且,训练这样的网络可能需要较长的时间以及仔细的微调步骤。因此,最好能将CNN运用到迭代重建之中。

A simple solution is to train a second CNN which learns to reconstruct from the output of the first CNN. In fact, we can concatenate a new CNN on the output of the previous CNN to build extremely deep networks which iterate between intermediate de-aliasing and the data consistency reconstruction. We term this a cascading network. In fact, one can essentially view this as unfolding the optimisation process of DLMRI. If each CNN expresses the dictionary learning reconstruction step, then the cascading CNN can be seen as a direct extension of DLMRI, where the whole reconstruction pipeline can be optimised from training, as seen in Fig. 4. In particular, owing to the forward and back-backpropagation rules defined for the DC layer, all subnetworks can be trained jointly in an end-to-end manner, defining yielding one large network.

一个简单的解决方案是训练第二个CNN来学习重建第一个CNN的输出。事实上,我们可以在现有CNN输出后不断连接新的CNN从而构造出在去混叠和数据一致化重建之间来回迭代的非常深的网络。我们将其命名为级联网络。事实上,人们可以将其视为DLMRI优化过程的展开形式。如果每个CNN都表示字典学习学习中的一步,那么级联的CNN可以视为是DLMRI的直接延伸,而整个重建流程能通过训练而得到优化,如 Fig. 4. 所示。特别地,由于DC层定义了前向和反向传播的规则,所有的子网络能以端到端的方式被联合训练,从而定义产生了一个巨大的网络。

Fig. 4.

A cascade of CNNs. DC denotes the data consistency layer and DS denotes the data sharing layer. The number of convolution layers within each network and the depth of cascade is denoted by nd and nc respectively. Note also that DS layer only applies when the input is a sequence of images.

一个CNN的级联。DC表示数据一致化层,而DS表示数据共享层。每个网络中卷积层的数量级联的深度分别用 ndnc 表示。需要指出的是,DS层只有在输入是图像序列时才会使用。

Data Sharing Layer

数据共享层

For the case of reconstructing dynamic sequences, the temporal correlation between frames can be exploited as an additional regulariser to further de-alias the undersampled images. For this, we use 3D convolution to learn spatio-temporal features of the input sequence. In addition, we propose incorporating features that could benefit the CNN reconstruction, inspired by data sharing approaches 24. S. J. Riederer, T. Tasciyan, F. Farzaneh, J. N. Lee, R. C. Wright, and R. J. Herfkens, “MR fluoroscopy: Technical feasibility,” Magn. Reson. Med., vol. 8, no. 1, pp. 1–15, Sep. 1988. 25. V. Rasche, R. W. de Boer, D. Holz, and R. Proksa, “Continuous radial data acquisition for dynamic MRI,” Magn. Reson. Med., vol. 34, no. 5, pp. 754–761, Nov. 1995. 26. S. Zhang, K. T. Block, and J. Frahm, “Magnetic resonance imaging in real time: Advances using radial FLASH,” Magn. Reson. Imag., vol. 31, no. 1, pp. 101–109, Jan. 2010. : if the change in image content is relatively small for any adjacent frames, then the neighbouring k -space samples along the temporal-axis often capture similar information. In fact, as long as this assumption is valid, for each frame, we can fill the entries using the samples from the adjacent frames to approximate missing k -space samples. Specifically, for each frame t , all frames from tnadj to t+nadj are considered, filling the missing k -space samples at frame t . If more than one frame within the range contains a sample at the same location, we take the weighted average of the samples. The idea is demonstrated in Fig. 2.

Fig. 2.

The illustration of data sharing approach. The acquired lines, which can be seen as nadj=0 , are colour-coded for each time frame. For each nadj , the missing entries in each frame are aggregated using the values from up to ±nadj neighbouring frames. The overlapped lines are averaged.

数据共享方法的说明。每一帧中采集到的线(标注为 nadj=0)都分别进行着色。对于每一个 nadj ,每一帧中缺失的条目用最多 ±nadj 个相邻帧的数值来融合。重叠的线平均处理。

An example of data sharing with nadj=2 applied to the Cartesian undersampling is shown in Fig. 3(a). As data sharing aggregates the lines in k -space, the resulting images can be seen as a zero-filled reconstruction from a measurement with lower undersampling factor. In practice, however, cardiac sequences contain highly dynamic content around the heart and hence combining the adjacent frames results in data inconsistency around the dynamic region, as illustrated in Fig. 3(b,c,d). However, for CNN reconstruction, we can incorporate these images as an extra input to train the network rather than treating them as the final reconstructions. Note that the reduction in the apparent acceleration factor is non-trivial to calculate: if each frame samples 10% of k -space, combining 5 adjacent frames in theory should cover 50%. However, one often relies on variable density sampling, which samples low-frequency terms more often, yielding overlapped lines between the adjacent frames. Therefore, the apparent acceleration factor is often much less. As a remedy, regular sampling can be considered. However, regular sampling results in coherent artifact in the image domain, the removal of which is a different problem from the one we address here, which attempts to resolve incoherent aliasing patterns. Alternatively, one can perform a sampling trajectory optimisation to reduce the overlapping factor, however, this is out-of-scope for this work and will be investigated in future.

Fig. 3.

The illustration of data sharing approach applied to the image and the mask from  Fig.1 (a,b). In this figure, (a) shows the appearance of the resulting sequence for nadj=2 . (b) The entries in k -space that are either acquired or aggregated using the data sharing approach with nadj=2 , which conceptually defines a sampling mask. (c) For a comparison, we show the resulting zero-filled reconstruction if (b) were treated as a mask. (d) The error map between the (a) and (b). One can observe their similarity except for the data inconsistency of the dynamic content around the heart region. Note that for nadj=2 , the obtained image has the appearance similar to acceleration factor around 4 (rather than 12/5 = 2.4, which is the maximum achievable from 5 frames) due to overlapping lines.

数据共享法应用于 Fig.1 (a,b) 图像和掩模的说明。在本图中, (a) 展示了 nadj=2 的结果序列。 (b)b和c标反了 在数据共享法的 nadj=2 处采集或融合的 k-空间条目。 (c) 作为对比,我们展示了当 (b) 被当作是掩模时的填零重建结果。 (d) (a)和(b)之间的误差图。他们之间的相似性是容易发现的,除了在心脏区域附近的动态内容。注意对于 nadj=2 ,获得的图像由于有重叠的线条,其加速倍率在4左右,而不是5帧的最大采集量 12/5 = 2.4。

For our network, we implement data sharing (DS) layers which take an input image and generate multiple “data-shared” images for a range of nadj . The resulting images are concatenated along the channel-axis and treated as a new input fed into the first convolution layer of the CNNs. Therefore, using the images obtained from data sharing can be interpreted as transforming the problem into joint estimation of aliasing as well as the dynamic motion, where the effect of aliasing is considerably smaller. Note that for the cascading network architecture, from the second subnetwork onwards, the input to each subnetwork is no longer “undersampled”, but instead contains intermediate predicted values from the previous subnetwork. In this case, we average all the entries from the adjacent frames and update the samples which were not initially acquired. For this work, we allocate equal weight on all adjacent k -space samples, however, in future, more elaborate averaging schemes can be considered.

Architecture and Implementation

架构与实施

Incorporating all the new elements mentioned above, we can devise our cascading network architecture. Our CNN takes in a two-channeled sequence of images R2NxNyNt , where the channels store real and imaginary parts of the zero-filled reconstruction in the image domain. Based on literature, we used the following network architecture for the CNN, illustrated in Fig. 4: it has nd13D convolution layers Ci , which are all followed by Rectifier Linear Units (ReLU) as a choice of nonlinearity. For each of them, we used a kernel size k=3 27. C. Szegedy, “Going deeper with convolutions,” in Proc. IEEE CVPR, Jun. 2015, pp. 1–9. and the number of filters was set to nf=64 . The final layer of the CNN module is a convolution layer Crec with k=3 and nf=2 , which projects the extracted representation back to the image domain. We also used residual connection 12. K. He, X. Zhang, S. Ren, and J. Sun. (Dec. 2015). “Deep residual learning for image recognition.” [Online]. Available: https://arxiv.org/abs/1512.03385 , which sums the output of the CNN module with its input. Finally, we form a cascading network by using the DC layers interleaved with the CNN reconstruction modules nc times. For DS layer, we take the input to each subnetwork, generating images for all nadj{0,1,,5} . As aforementioned, the resulting images are concatenated along the channel-axis and fed to the first convolution layer. We found that this choice of architecture works sufficiently well, however, the parameters were not optimised and there is therefore room for refinement of the results presented. Hence the result is likely to be improved by, for example, incorporating pooling layers and varying the parameters such as kernel size and stride 14. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. MICCAI, Munich, Germany, 2015, pp. 234–241. , 28. F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” in Proc. ICLR, San Juan, Puerto Rico, 2016, pp. 1–9. .

Our model can also be used for 2D image reconstruction by setting Nt=1 and use 2D convolution layers instead, however, data sharing does not apply to 2D reconstruction. For the following experiments, we first explore the network configurations by considering 2D MR image reconstruction. We identify our network by the values of nc , nd and the use of data sharing. For example, D5-C2 means a network with nd=5 , nc=2 with no data sharing. D5-C10(S) corresponds a network with nd=5 , nc=10 and data sharing.

As mentioned, pixel-wise squared error was used as the objective function. As the proposed architecture is memory-intensive, a small minibatch size is used to train the cascade networks. We used minibatch size 1 for all the experiments but we did not observe any problem with the convergence. We initialised the network weights using He initialisation 29. K. He, X. Zhang, S. Ren, and J. Sun. (Feb. 2015). “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification.” [Online]. Available: https://arxiv.org/abs/1502.01852 . The Adam optimiser 30. D. Kingma and J. Ba. (Dec. 2014). “Adam: A method for stochastic optimization.” [Online]. Available: https://arxiv.org/abs/1412.6980 was used to train all models, with parameters α=104,β1=0.9 and β2=0.999 unless specified. We also added 2 weight decay of 107 .

Experimental Results

实验结果

A. Setup

A. 设置

1) Dataset:

1) 数据集:

Our method was evaluated using the cardiac MR dataset consisting of 10 fully sampled short-axis cardiac cine MR scans. Each scan contains a single slice SSFP acquisition with 30 temporal frames with a 320×320 mm field of view and 10 mm slice thickness. The data consists of 32-channel data with sampling matrix size 192×190 , which was zero-filled to the matrix size 256×256 . The raw multi-coil data was reconstructed using SENSE 31. K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. Boesiger, “SENSE: Sensitivity encoding for fast MRI,” Magn. Reson. Med., vol. 42, no. 5, pp. 952–962, Nov. 1999. with no undersampling and retrospective gating. Coil sensitivity maps were normalized to a body coil image to produce a single complex-valued image set that could be back-transformed to regenerate complex k -space samples or further processed to form final magnitude images. For the following experiments, we perform retrospective undersampling, simulating a practical single-coil acquisition scenario.

我们的方法使用由10个全采样短轴心脏动态磁共振扫描所组成的心脏磁共振数据集来评估。

2) Undersampling:

2) 降采样:

In this work, we focus on Cartesian undersampling, where one fully samples frequency-encodes (along kx ) and randomly undersamples the phase encodes (along ky ). In addition, we pair consecutive phase encodes, which has been reported to reduce eddy current which is a source of image degredation 33. M. Markl, K. Scheffler, and O. Bieri, “Analysis and compensation of eddy currents in balanced SSFP,” Magn. Reson. Med., vol. 54, no. 1, pp. 129–137, Jul. 2005. . For each frame, the eight lowest spatial frequencies are always acquired and other frequencies have a probability of being acquired determined by a zero-mean Gaussian variable density function that is marginally offset, such that the probability of acquisition never reaches zero even at the highest frequencies. An implementation of this approach can be found in 4. H. Jung, J. C. Ye, and E. Y. Kim, “Improved kt BLAST and kt SENSE using FOCUSS,” Phys. Med. Biol., vol. 52, no. 11, pp. 3201–3226, Jun. 2007. , and an example of a 2D mask and its effect on the magnitude of a temporal frame is shown in Fig. 5. For each experiment, the undersampling rate is fixed and will be stated. For training, the sampling masks were generated on-the-fly to allow the network to learn the differences between potential aliasing artefacts and the underlying signal better. Note that for each acceleration factor acc , one can generate (kyky/acc) different masks.

Fig. 5.

The detail of the Cartesian undersampling mask employed in this work. Note that the mask can be seen as a 3D volume indexed by (kx,ky,t ). For each image frame t , we fully sample along kx -axis and undersample in ky direction. We always acquire the 8 central lines and the remaining lines are sampled according to a zero-mean Gaussian distribution with the tail that is marginally offset so it will never reach zero.

本工作所采用笛卡尔掩模的具体详情。注意掩模可以被视作是一个由(kx,ky,t)索引的三维体。对于每一个图像帧 t ,我们完整地沿着 kx 轴进行采样,并在 ky 方向进行降采样。我们总是采集中心的八条线,然后剩余的线则根据一个零均值高斯分布来采样。让这个分布在两端有着微小的偏移,从而使其不会太接近0.

While Cartesian acquisition is the most common protocol in practice and offers straightforward implementation using fast Fourier transform (FFT), other practical sampling strategies such as radial 34. K. T. Block, M. Uecker, and J. Frahm, “Undersampled radial MRI with multiple coils. Iterative image reconstruction using a total variation constraint,” Magn. Reson. Med., vol. 57, no. 6, pp. 1086–1098, Jun. 2007. or spiral 35. M. Lustig, J. H. Lee, D. L. Donoho, and J. M. Pauly, “Faster imaging with randomly perturbed, under-sampled spirals and L1 reconstruction,” in Proc. ISMRM, Miami Beach, FL, USA, 2005, p. 685. could be considered, which achieve greater aliasing incoherence. Nevertheless, they require the use of methods such as nonuniform Fourier transforms and gridding 36. J. A. Fessler, “On NUFFT-based gridding for non-Cartesian MRI,” J. Magn. Reson., vol. 188, no. 2, pp. 191–195, Oct. 2007. which could propagate interpolation errors.

3) Data Augmentation:

3) 数据增强:

Typically, deep learning benefits from large datasets, which are often not available for medical images. Our dataset is relatively small (300 images), however, the literature suggests that it is still possible to train a network by applying appropriate data augmentation strategies 14. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. MICCAI, Munich, Germany, 2015, pp. 234–241. . Therefore, we follow that practice and apply data augmentation including rigid transformation and elastic deformation to counter overfitting. Specifically, given each image (or a sequence of images), we randomly apply translation up to ±20 pixels along x and y -axes, rotation of [0,2π ), reflection along x -axis by 50% of chance. Therefore, from rigid transformation alone, we create 0.3 million augmented data per image. Combined with the on-the-fly generation of undersampling masks, we generate very large dataset. For the dynamic scenario, we further added elastic deformation, using the implementation in 32. P. Y. Simard, D. Steinkraus, and J. C. Platt, “Best practices for convolutional neural networks applied to visual document analysis,” in Proc. ICDAR, vol. 3. Aug. 2003, pp. 958–962. , with parameters α[0,3] and σ[0.05,0.1] , sampled uniformly, as well as reflection along temporal axis. Note that while strong elastic deformation may produce anatomically unrealistic shapes its use is justified as our goal is to train a network which learns to de-alias the underlying object in the image, rather than explicitly learning the anatomical shapes.

4) Evaluation Methodology:

4) 评价原则:

For the 2D experiments, we split the dataset into training and testing sets including 5 subjects each. Each image frame in the sequence is treated as an individual image, yielding a total of 150 images per set. Note that typically, a portion of training data is treated as a validation set utilised for early-stopping 37. C. Bishop, Pattern Recognition and Machine Learning. New York, NY, USA: Springer-Verlag, 2006., where one halts training if the validation error starts to increase. Initially, we used 3–2–5 split for training, validation and testing. However, even after 3 days of training cascade networks, we did not observe any decrease in the validation error. Therefore, we instead included the validation set in the training to further improve the performance but fix the number of backpropagation to be an order of 105 , which we empirically found to be sufficient. For the dynamic experiments, we used 7-3 split for training and testing and an order of 104 for the number of backpropagation.

To evaluate the performances of the trained networks, we used mean squared error (MSE) as our quantitative measure. The reconstruction signal-to-noise ratio from undersampled data is highly dependent on the imaging data and the undersampling mask. To take this into consideration for fair comparison, we assigned an arbitrary but fixed undersampling mask for each image in the test data, yielding a fixed number of image-mask pairs to be evaluated.

B. Reconstruction of 2D Images

B. 2D图像的重建

1) Trade-Offs Between nd and nc :

1) ndnc 之间的权衡 :

In this experiment we compared two architectures: D5-C2 (nd=5,nc=2 ) and D11-C1 (nd=11,nc=1 ) to evaluate the benefit of the DC step. The two networks have equivalent depths when the DC layers are viewed as feature extraction layers. However, the former can build deeper features of the image, whereas the latter benefits from the intermediate data consistency step. The undersampling rate was fixed to 3-fold and each network was trained end-to-end for 3×105 backpropagations.

The MSE’s on the training and test data are shown in Fig. 6. Note that a gap between the performance on training and test set may exist by the nature of the dataset (e.g. due to image features, initial level of aliasing, etc.) and therefore it is more informative to study in combination the rate of improvement and the slope at the tail of the curves to assess the overfitting process. Indeed, one can observe that D11-C1 eventually started to overfit the training data after about 1.2×105 backpropagations. As one would expect, since our dataset is small, deep networks can overfit easily. On the other hand, both train and test errors for D5-C2 were notably lower and had relatively tighter gap, showing better generalisability compared to D11-C1. This is suggestively because the architecture employs two data consistency steps and rebuilds the representations at each cascading iteration. This suggests that it is more beneficial to interleave DC layers projecting the acquired k -space onto intermediate reconstructions with the CNN image reconstruction modules, which appears to help both the reconstruction as well as the generalisation. Nevertheless, there is a considerable gap between train and test data even for D5-C2. However, we note from the figure that even after 3×105 backpropagations, the test error is still improving. Therefore, although it seems that the network gets more optimised to the features in training data quickly, it still learns features generalisable to test data. Having more training data is likely to accelerate the learning process.

Fig. 6.

A comparison of the networks with and without the intermediate DC step. D5-C2 shows superior performance over D11-C1. In particular, D5-C2 has considerably lower test error, showing an improved generalization property.

有和没有中间DC层的对比。D5-C2显示了比D11-C1更好的性能。尤其是D5-C2的测试误差有了大幅的降低,展现了更好的泛化性能。

2) Effect of Cascading Iterations nc :

2) nc 在级联迭代中的作用 :

In this experiment, we explored how much benefit the network can get by increasing the cascading iteration. We fixed the architectures to have nd=5 , but varied the cascading iteration nc{1,2,3,4,5} . For this section, due to time constraints, we trained the networks using a greedy approach: we initialised the cascading net with nc=k using the net with nc=k1 that was already trained. For each nc , we performed 105 backpropagations. Note that the greedy approach leads to a satisfactory solution, however, better results can be achieved with random initialisation, as initialising a network from another networks convergence point can make it more likely that it gets stuck in suboptimal local minima.

Reconstruction errors for each cascading network of different nc are shown in Fig. 7. We observed that while deeper cascading nets tend to overfit more, they still reduced the test error every time. The rate of improvement was reduced after 3 cascading layers, however, we see that the standard deviation of error was also reduced for the deeper models. In the interest of space, we have not shown the resulting images of each D5-C nc but we have observed the that increasing nc resulted in images with more of the subtle image details correctly reconstructed and there were also less noise-like aliasing remaining in the images.

Fig. 7.

The effect of increasing cascading iteration nc . One can see that the reconstruction error on both training and test data monotonically decreases as nc increases. However, the rate of improvement is reduced after nc=3 .

增加级联迭代次数 nc 的效果。容易看出在训练集和测试集数据上的训练误差都随着 nc 的增加而单调递减。然而,改善的幅度在 nc=3 之后就明显减少了。

On the other hand, in Fig. 8, we show the intermediate reconstructions from each subnetwork within D5-C5 to better understand how the network exploits the iterative nature internally. In general, we see that the cascading net gradually recovers and sharpens the output image. Although the reconstruction error decreased monotonically at each cascading depth, we observed that the output of the fourth subnetwork appears to be more grainy than the output of the preceding subnetwork. This suggests the benefit of the end-to-end training scheme: since we are optimising the whole pipeline of reconstruction, the additional CNNs are internally used to rectify the error caused by the previous CNNs. In this case, the fourth subnetwork appears to counteract over-smoothing in the third subnetwork.

Fig. 8.

2D reconstruction results of D5 -C5 for one of the test subjects. Here we inspect the intermediate output from each subnetwork in D5-C5. (a) Ground truth (b) The input to the network that was 3× undersampled image. The output of (c) first, (d) second, (e) third, (f) fourth cascading subnetwork respectively. (g,h) The final output and the corresponding error. Note that this is not the reconstruction results from the networks in Experiment in VII-B2.

其中一个测试对象的 D5 -C5 二维重建结果。在这里我们逐个审视D5-C5中每一个子网络的输出。 (a) 基准真相 (b) 网络的输入,即三倍欠采样图像。第一层 (c) 、第二层 (d) 、第三层 (e) 、第四层 (f) 级联子网络各自的输出。 (g,h) 最终输出和对应的误差。注意这不是实验 VII-B2 中各个不同网络的重建结果。

3) Comparison With DLMRI:

3) 与 DLMRI 的比较 :

In this experiment, we compared our model with the state-of-the-art DL-based method, DLMRI, for reconstructing individual 2D cardiac MR images. The comparison was performed for 3-fold and 6-fold acceleration factors.

a) Models:
a) 模型:

For CNN, we selected the parameters nd=5 , nc=5 . To ensure a fair comparison, we report the aggregated result on the test set from two-way cross-validation (i.e. two iterations of train on five subjects and test on the other five). For each iteration of the cross validation, the network was end-to-end trained using He intialisation 29. K. He, X. Zhang, S. Ren, and J. Sun. (Feb. 2015). “Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification.” [Online]. Available: https://arxiv.org/abs/1502.01852 . For 6-fold undersampling, we initialised the network using the parameters obtained from the trained models from 3-fold acceleration. Each network was trained for 3×105 backpropagations, which took one week to train per network on a GeForce TITAN X, however, our manual inspection of the loss curve indicates that the training error plateaued at much early stage, approximately within 3 days.

For DLMRI, we used the implementation from 2. S. Ravishankar and Y. Bresler, “MR image reconstruction from highly undersampled k-space data by dictionary learning,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1028–1041, May 2011. with patch size 6×6 . Since DLMRI is quite time consuming, in order to obtain the results within a reasonable amount of time, we trained a joint dictionary for all time frames within the subject and reconstructed them in parallel. Note that we did not observe any decrease in performance from this approach. For each subject, we ran 400 iterations and obtained the final reconstruction.

b) Results:
b) 结果:

The means of the reconstruction errors across 10 subjects are summarised in Table. I. For both 3-fold and 6-fold acceleration, one can see that CNN consistently outperformed DLMRI, and that the standard deviation of the error made by CNN was smaller. The reconstructions from 6-fold acceleration is in Fig. 9. Although both methods suffered from significant loss of structures, the CNN was still capable of better preserving the texture than DLMRI (highlighted in red ellipse). On the other hand, DLMRI created block-like artefacts due to over-smoothing. 6× undersampling for these images typically approaches the limit of sparsity-based methods, however, the CNN was able to predict some anatomical details which was not possible by DLMRI. This could be due to the fact that the CNNs has more free parameters to tune with, allowing the network to learn complex but more accurate end-to-end transformations of data.

TABLE I

The Result of 2D Reconstruction. DLMRI vs. CNN Across 10 Scans

二维重建的结果。DLMRI对比CNN,横跨10次扫描。

Fig. 9.

The comparison of 2D reconstructions from DLMRI and CNN for test data. (a) The original (b) 6× undersampled (c,d) CNN reconstruction and its error map (e,f) DLMRI reconstruction and its error map. There are larger errors in (f) than (d) and red ellipse highlights the anatomy that was reconstructed by CNN better than DLMRI.

在测试集上 DLMRI 和 CNN 二维重建的对比。 (a) 原始图像 (b) 6× 降采样 (c,d) CNN重建和它的误差图 (e,f) DLMRI重建和它的误差图。相比 (d) ,(f) 有更大的误差,而红色椭圆标出的解剖结构也是CNN重建地比DLMRI更好。

c) Comparison of reconstruction speed:
c) 重建速度的对比:

While training the CNN is time consuming, once it is trained, the inference can be done extremely quickly on a GPU. Reconstructing each slice took 23 ± 0.1 milliseconds on a GeForce GTX 1080, which enables real-time applications. To produce the above results, DLMRI took about 6.1 ± 1.3 hours per subject on CPU. Even though we do not have a GPU implementation of DLMRI, it is expected to take longer than 23ms because DLMRI requires dozens of iterations of dictionary learning and sparse coding steps. Using a fixed, pre-trained dictionary could remove this bottleneck in computation although this would likely be to the detriment of reconstruction quality.

C. 3D Experiments

C. 3D实验

For the following experiments, we split our dataset into training and testing sets containing seven and three subjects respectively. Compared to the 2D case, we have significantly less data. As aforementioned, we applied elastic deformations in addition to rigid transformation to augment the training data input in order to increase the variation of the examples seen by the network. Furthermore, working with a large input is a burden on memory, limiting the size of the network that can be used. To address this, we trained our model on an input size 256×Npatch×30 , where the direction of patch extraction corresponds to the frequency-encoding direction. In this way, we can train the network with the same aliasing patterns while reducing the input size. Note that the extracted patches of an image sequence will have different k -space values compared to the original data once the field-of-view (FOV) is reduced. As such, this trick only works for training where the patches can be treated as the new instances of training data. In particular at test time, since only the raw data with full FOV is available, the CNN must also be applied to the entire volume in order to perform data consistency step correctly.

1) Effect of Data Sharing:

1) 数据共享的作用:

In this experiment, we evaluated the effect of using the features obtained from data sharing. We trained the following two networks: D5-C10(S) (nd=5 , nc=10 with data sharing) and D6-C10 (nd=6 , nc=10 without data sharing). In the second network, the data sharing is replaced by an additional convolution layer to account for the additional input. We trained each model to reconstruct the sequences from 9-fold undersampling for 2.5×104 backpropagations. Their learning is plotted in Fig. 10. We can notice that there is a considerable difference in their errors. The error of the D5-C10(S) was smaller for both train and test, suggesting that it was able to learn a strategy to de-alias image that generalises better. Moreover, by using data sharing, the network was able to learn faster. The visualization of their reconstructions can be found in the following section.

Fig. 10.

The effect of data sharing. The network with data sharing shows superior performance over the other. In particular, it has considerably lower test error, showing an improved generalization property.

数据共享层的效果。有数据分享层的网络展现出相比其他更好的性能。特别地,它的测试误差大幅降低,展现了更好的泛化特性。

2) Comparison With State-of-the-Art:

2) 与其他先进成果的比较:

In this experiment, we compared our model with state-of-the-art methods: DLTG 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. , kt-SLR 20. S. G. Lingala, Y. Hu, E. DiBella, and M. Jacob, “Accelerated dynamic MRI exploiting sparsity and low-rank structure: Kt SLR,” IEEE Trans. Med. Imag., vol. 30, no. 5, pp. 1042–1054, May 2011. and L+S 21. R. Otazo, E. Candes, and D. K. Sodickson, “Low-rank plus sparse matrix decomposition for accelerated dynamic MRI with separation of background and dynamic components,” Magn, Reson. Med., vol. 73, no. 3, pp. 1125–1136, 2015. for reconstructing the dynamic sequence. We compared the results for 3, 6, 9 and 11-fold acceleration factors.

a) Models:
a) 模型:

For the CNN, we used nd=5 , nc=10 with data sharing as explained above. We also set the weight decay to 0 as we did not notice any overfitting of the model. Contrary to the 2D case, we trained each network as follows: we first pre-trained the network on various undersampling rates (0–9× ) for 5×104 backpropagations. Subsequently, each network was fine-tuned for a specific undersampling rate using Adam with learning rate reduced to 5×105 for 104 backpropagations. We performed three way cross validation (where for two iterations we train on 7 subjects then test on 3 subjects, one iteration where we train on 6 subjects and test on 4 subjects) and we aggregated the test errors. The pre-training and the fine tuning stages took approximately 3.5 days and 14 hours respectively using a GeForce GTX 1080. Since the training is time consuming, we did not train the networks longer but we speculate that the network will benefit from further training using lower learning rates. For DLTG, we used the default parameters described in 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. . For kt-SLR, we performed grid search to identify the optimal parameters for the data, which were μ1=105 , μ2=108 , ρ=0.1 . Similarly for L+S, the optimal parameters were λL=0.01λS=0.01 .

b) Result:
b) 结果:

The final reconstruction error is summarised in Fig. 11. we see that CNN consistently outperforms state-of-the-art methods for all undersampling factors. For a low acceleration factor (3× undersampling), all methods performed approximately the same, however, for more aggressive undersampling factors, CNN was able to reduce the error by a considerable margin. For aggressive undersampling rates, the performance of kt-SLR and L+S degraded much faster. These methods employ low-rank and simple sparsity constraints. We speculate that they underperformed in this regime because the data is not exactly low-rank (as our temporal dimension is already small) as well as the sparsifying transforms (temporal FFT for L+S and temporal gradient for kt-SLR) lack adaptability to data compared to CNN and DLTG. The visualisation of reconstruction from 9-fold undersampling is shown in Fig. 12, including the reconstruction from the CNN without data sharing and DLTG. The reconstructions of kt-SLR and L+S were omitted as their quantitative error were already much worse. One can see that, as with the 2D case, at aggressive undersampling rate dictionary-learning based method produced blocky artefacts, whereas the CNN methods were capable of reconstructing finer details (indicated in red ellipse). On the other hand, for the CNN without data sharing, one can notice grainy noise-like artefacts. Even though it was able to reconstruct the underlying anatomy more faithfully than DLTG, the overall error was worse. However, this artefact was not present in the images reconstructed by the CNN with data sharing. Although the quantitative result is not shown, CNN without data sharing in fact outperformed DLTG for low acceleration factor (3× ) but not for more aggressive undersampling factor. This suggests that when the aliasing is severe, more drastic transformation is required, in which case for CNN to do better, we either need to increase depth, which would increase its computation cost, or increase the training samples. This confirms the importance of data sharing and the necessity to exploit the domain knowledge to simplify the learning problem for the case when the data is limited. Temporal profiles from the reconstructions are shown in Fig. 13. Even though the data sharing itself results in data inconsistency in highly dynamic regions, the CNN was able to rectify this internally and reconstructed the correct motion with errors smaller than the other methods. This suggests the CNN’s capability solve the joint de-aliasing and implicit estimation of dynamic motion.

Fig. 11.

The reconstruction errors of CNN vs state-of-the-art methods across 10 subjects for different undersampling rates. Note that we average over the test error from all iterations of cross-validation.

对比10个被试取不同将采样率下的CNN和先进方法的重建误差。注意我们平均了所有交叉验证迭代的测试误差。

Fig. 12.

The comparison of cardiac MR image sequence reconstructions from DLTG and CNN. Here we show n th slice from one of the test subjects (a) The original (b) 9× undersampled (c,d) CNN with data sharing and its error map (e,f) CNN without data sharing and its error map (g,h) DLTG reconstruction and its error map. Red ellipses highlight the anatomy that was reconstructed by CNN better than DLTG.

DLTG a和 CNN 重建心脏磁共振成像的对比。这里我们展示了其中一个测试被试的第 n 层。 (a) 原始图像 (b) 9× 降采样 (c,d) 有数据共享层的CNN和它的误差图。 (e,f) 没有数据共享层的CNN和它的误差图 (g,h) DLTG 重建和它的误差图。红色椭圆标注的时CNN重建好于DLTG的解剖结构。

Fig. 13.

The comparison of reconstructions along temporal dimension. Here we extract a 110th slice along y-axis from the previous figure. (a) The original (b) 9× undersampled (c,d) CNN with data sharing and its error map (e,f) CNN without data sharing and its error map (g,h) DLTG reconstruction and its error map.

沿着时间维度的重建结果对比。这里我们从前面的图片中沿着y轴提取了第110层。 (a) 原始图像 (b) 9× 降采样图像 (c,d) 有数据共享层的CNN和它的误差图。 (e,f) 没有数据共享层的CNN和它的误差图 (g,h) DLTG 重建和它的误差图。

c) Reconstruction with noise:
c) 在有噪声的环境中重建:

This section analyses the impact of acquisition noise in reconstruction performance. In this experiment we fixed the acceleration factor to be 3 and varied the level of noise in the data. Specifically, we tested for noise power σ2[109,4×108] . For fully-sampled reconstruction, the noise power is equivalent to peak signal-to-noise (PSNR) values of 41.84 dB and 25.81 dB for 10−9 and 4×108 respectively, where PSNR was calculated as 10log10(1/MSE) . The result is summarised at Fig 14, where we aggregate the reconstruction error from all 10 subjects. The input level of noise is indicated by PSNRf and for consistency, the reconstruction results are also indicated by PSNR (higher the better). For DLTG, we used the value λ=5×106 as recommended in 3. J. Caballero, A. N. Price, D. Rueckert, and J. V. Hajnal, “Dictionary learning and time sparsity for dynamic MR data reconstruction,” IEEE Trans. Med. Imag., vol. 33, no. 4, pp. 979–994, Apr. 2014. . DLTG showed decent robustness to noise, owing to the nature of underlying K-SVD, which has the effect of sparse coding denoising. For kt-SLR and L+S, we used the same parameters as before. They showed some robustness for small noise but they did not perform well in the presence of aggressive noise, as the implementations (and the data consistency step in particular) do not explicitly account for them. Changing such implementation is likely to improve the result.

Fig. 14.

The aggregated test error across 10 subjects with injected noise. For different value of input noise power, PSNRf is shown. The corresponding reconstruction PSNR for CNN-NAD, CNN-AD, DLTG, kt-SLR and L+S are shown.

10个注入噪声被试的综合测试误差。不同的输入噪声功率用PSNRf表示,相应展示了 CNN-NAD、 CNN-AD、 DLTG、 kt-SLR 和 L+S 的重建PSNR。

Fig. 15.

The reconstruction with noise σ2=4×108 . The aggregated test error across 10 subjects with injected noise. For different value of input noise power, PSNRf is shown. The corresponding reconstruction PSNR for CNN, finetuned CNN, DLTG, kt-SLR and L+S are shown.

σ2=4×108 噪声下的重建结果。测试误差综合了10个注入噪声的被试数据。不同的输入噪声功率用PSNRf表示,相应展示了 CNN、 精心调整的 CNN、 DLTG、 kt-SLR 和 L+S 的重建PSNR。

For CNN, we used the model D5-C10(S) as before and tested the following two variations. Firstly, we tested the performance of CNN from the previous section, which were trained in the absence of noise, denoted as CNN-NAD (blue curve). It can be seen that for the low level of noise (PSNR > 35 dB), CNN-NAD were able to maintain similar performance as the rest of the methods. However, the performance degraded almost at the same rate as kt-SLR and L+S for the high level of noise. We then trained CNN-NAD to adapt for noise as following. Firstly, we added noise in training data, where we randomly sample the noise power in the range [109,4×108] . Secondly, we modified our data consistency layers to account for noise. In particular, we initialised λ for each DC layer as λ=q/σ=0.025 (as in DLTG), made the parameters trainable. We trained the network for 3×104 backpropagations and the result is denoted as CNN-AD (green curve). Interestingly, the performance for very small noise (> 38 dB) became worse compared to the original CNN. However, for further acceleration, it showed significant improvement for all level of noise, showing better robustness compared to other methods. We also observed that after fine-tuning, λ was increased to 0.5. This signifies that DLTG and CNN, even though the reconstruction framework shows similarity in terms of the iterative nature, are fundamentally different approaches and the required parameters also vary. Note that since we trained the network for a wide range of noise, the performance is likely to be improved if a narrower range of noise is selected for training. In practice, measuring the level of noise a-priori is non-trivial. However, our CNN showed the adaptability to the pre-specified range which indeed can be simulated in practice.

d) Reconstruction speed:
d) 重建速度:

Similar to the 2D case, the DLTG takes 6.6 hours per subject on CPU. For the CNN, each sequence was reconstructed on average 8.21s±0.02s on GPU GeForce GTX 1080. This is significantly slower than reconstructing 2D images as introducing a temporal axis greatly increases the computational effort of the convolution operations. Nevertheless, the reconstruction speed of our method is much faster than DLTG and is reasonably fast for offline reconstruction.

D. Memory Requirement

D. 内存需求

The memory requirement of the CNNs is based on the number of the network parameters, the number and the sizes of the intermediate activation maps and the space needed for computing the layer operations. The total number of the network parameters is simply given by the sum of all the layer parameters. Each convolution layer has (kxkyktnf+1)nf parameters, where kx , ky , kt are the kernel sizes along x , y and t , nf and nf are the number of features of the incoming and current convolution layers respectively and one for the bias. For each DC layer, we also store one parameter for λ . For 2D reconstruction (kt=1 ) and D5-C5 has about 0.6 million parameters, which occupies 2.3MB of the storage assuming single-precision floating point is used (Nprecision=4 bytes). For dynamic reconstruction, D5-C10(S) has 3.4 million parameters, which occupies about 13.6MB.

At the training stage, more than three times of the number of parameters are required for computing the gradient. In addition, all the intermediate activation maps need to be stored to perform the backpropagation efficiently. For the proposed architecture, the most of the activation maps are of the convolution layer Ci ’s; hence, the sum can be roughly estimated by NbatchNxNyNtNfnc(nd1)Nprecision . With the input size Nbatch×Nx×Ny×Nt=1×256×256×1 , the memory required for the activation maps of D5-C5 is 335MB. For the dynamic models, the memory requirement further increases by the size of the temporal dimension Nt=30 . Therefore, the aforementioned trick of cropping the images along Ny is necessary to fit the model. For D5-C10(S), with the input size 1×256×(256/8)×30 , 2.4GB is required for storing the activation maps alone. Finally, to obtain the total memory consumption for the training stage, this value needs to be further multiplied by factors based on the implementation of backpropagation, operations including convolution and FFT as well as any compilation optimisation performed by the library. For example, most implementations of backpropagation require twice the value above accounting for forward- and backward-passes. We report that for our Theano implementation of D5-C10(S), the largest mini-batch size we could fit for the given input size on GeForce GTX 1080 (8GB) was 1.

At the testing stage, the memory requirement is much less because the intermediate activation maps do not need to be stored if only the forward pass needs to be performed. In this case, the memory overhead is only the single largest activation map, which is Ci , scaled by implementation-specific factors. Note that as aforementioned, the patch extraction cannot be used at test time. Nevertheless, we did not observe any problem using D5-C10(S) for input size 1×256×256×30 on GeForce GTX 1080.

Discussion and Conclusion

讨论和总结

In this work, we evaluated the applicability of CNNs for the challenge of reconstructing undersampled cardiac MR image data. The experiments presented show that using a network with interleaved data consistency stages, it is feasible to obtain a model which can reconstruct images well. The CS and low-rank framework offers a mathematical guarantee for the signal recovery, which makes the approach appealing in theory as well as in practice even though the required sparsity cannot generally be genuinely achieved in medical imaging. However, even though this is not the case for CNNs, we have empirically shown that a CNN-based approach can outperform them. In addition, at very aggressive undersampling rates, the CNN method was capable of reconstructing most of the anatomical structures more accurately based on the learnt priors, while classical methods do not guarantee such behaviour.

Note that remarkably, we were able to train the CNN on the small dataset. We used several strategies to alleviate the issue of overfitting: firstly, as we employed the iterative architecture, each subnetwork has relatively small receptive field. As a result, the network can only performs local transformations. Secondly, we applied intensive data augmentation so the network practically sees constantly sees a variation of the input, which makes it more difficult to overfit to any specific patterns. However, we speculate that given more training data, we can drop the data augmentation and let the network learn coarse features by incorporating, for example, dilated or strided convolution, which could further improve the performance.

It is important to note that in the experiments presented the data was produced by retrospective undersampling of back transformed complex images (equivalent to single-coil data) obtained through an original SENSE reconstruction. Although the application of CNN reconstruction needs to be investigated in the more practical scenario of full array coil data from parallel MR, the results presented show a great potential to apply deep learning for MR reconstruction. The additional richness of array coil data has the potential to further improve performance, although it will also add considerable complexity to the required CNN architecture.

In this work, we were able to show that the network can be trained using arbitrary Cartesian undersampling masks of fixed sampling rate rather than selecting a fixed number of undersampling masks for training and testing. In addition, we were able to pre-train the network on various undersampling rates before fine-tuning the network. This suggests that the network was capable of learning a generic strategy to de-alias the images. A further investigation should consider how tolerant the network is for different undersampling patterns such as radial and spiral trajectories. As these trajectories provide different properties of aliasing artefacts, a further validation is appropriate to determine the flexibility of our approach. However, radial sampling naturally fits well with the data sharing framework and therefore can be expected to push the performance of the network further. The data sharing approach may also make it feasible to adopt regular undersampling patterns which are intrinsically more efficient. Another interesting direction would be to jointly optimise the undersampling mask using the learning framework.

To conclude, although CNNs can only learn local representations which should not affect global structure, it remains to be determined how the CNN approach operates when there is a pathology present in images, or other more variable content. We have performed a cross-validation study to ensure that the network can handle unseen data acquired through the same acquisition protocol. Generalisation properties must be evaluated carefully on a larger dataset. However, CNNs are flexible in a way such that one can incorporate application specific priors to their objective functions to allocate more importance to preserving features of interest in the reconstruction, provided that such expert knowledge is available at training time. For example, analysis of cardiac images in clinical settings often employs segmentation and/or registration. Multi-task learning is a promising approach to further improve the utility of CNN-based MR reconstructions.