JMAG has a track record of continued pursuit for acceleration through the development and fine-tuning of calculation algorithms. JMAG utilizes the speed of single core acceleration. It also effectively uses hardware to deal with large-scale models to work in parallels.
JMAG released a GPU solver and High parallel solver with enhanced highly parallel processing (HPC) solution.
- Symmetric multiple processing (SMP) (Multiple CPU, multiple core support)
- Massively parallel processing (MPP)
- Graphics processing unit (GPU)
Symmetric multiple processing (SMP) (Multiple CPU, multiple core support)
The latest version of JMAG improves the convergence of ICCG and nonlinear calculations.
Massively parallel processing (MPP)
JSOL developed the JMAG High Parallel Solver (hereafter called MPP Solver) with high parallelism to realize a high-speed computation via a cluster system connected to multiple computers (hereafter called nodes) in a high-speed network. This solver enables using multiple cores in a CPU as well as multiple CPUs in the cluster, which achieves a higher degree of parallelism in analysis and increases calculation speed.
Calculation speed evaluation
This section describes effects of enhanced speed performance using the JMAG MPP solver. The following table shows specifications of hardware used in the test.
|CPU||Intel® Xeon® E5-2670|
|Number of cores / processor||8|
|Number of processors / node||2|
|Number of nodes||16|
Transient Response Analysis of Embedded Type PM Motors
We ran a transient response analysis for one period of electric angle for a large-scale 3D PM synchronous motor (approx. 2.06 million elements).As a result, only 2.5 hours and 1 hour and 45 min. were necessary for 32 and 64 parallels, respectively. They are 13 times and 20 times faster than conventional non-parallel computing.
Analysis Time (Embedded Type PM Motor)
Bus Bar Frequency Response Analysis
A frequency response analysis was run for a large-scale 3D bus bar (approx. 2.42 million elements). Non-parallel processing required approx. 60 minutes analysis time, but 32 and 64 threads needed approx. 6.4 min and 4.6 min, respectively.
Analysis Time (bus bar)
Graphics processing unit (GPU)
In recent years, performance of the GPU (Graphics Processing Unit) has greatly improved. The GPU overwhelmingly outnumbers the CPU in terms of cores and effectiveness on parallel processing.
These days, the GPU has been used as an arithmetic device for super computers as well as being used for conventional image processing because of its strength in parallel processing capability. The GPU has attracted a lot of attention from the CAE field and GPGPU (General-purpose computing on graphics processing units) using GPU for general purposes, including math calculation, has been gaining popularity. We were early to spot GPGPU and have continued development since we first provided a GPU solver in 2012.
Calculation speed evaluation
This section describes case studies evaluating JMAG GPU solver using NVIDIA’s Tesla K40, the latest GPU for math calculation.
In numerical calculations, most of the calculation time is for processing iterative solutions of linear equations obtained in the finite element method; in other words, it is spent solution-finding. Especially when using a large-scale mesh model with millions of elements, a large proportion of the processing time is required for solution-finding. JMAG GPU solver employs a technology to accelerate such processing times using GPUs. This section shows the effectiveness of analysis time reduction when a using a GPU in comparison with using a shared memory CPU parallel solver. Hardware specifications of the GPU and CPU used are shown below.
|Hardware||CPU Intel® Xeon® X5670||GPU NVIDIA® Tesla® K40|
|Clock frequency (GHz)||2.93||0.745|
|Number of cores||12 (2CPU)||2880 (1GPU)|
|Memory bandwidth (GB/s)||32||288|
Transient Response Magnetic Field Analysis of Embedded Type PM Motors
The following figure shows analysis times when conducting two steps of a transient response magnetic field analysis on a 4-pole, 24-slot embedded type PM motor model. This model has approx. two million elements. Compared with the calculation time of a single-core CPU, the anticipated calculation speed increase is approx. 10x when using only one GPU, and approx. 14x when using two GPUs.
Analysis Time (Embedded Type PM Motors)
Transient Response Magnetic Field Analysis of Linear Motors
The following shows analysis times when running two steps of a transient response magnetic field analysis on a linear motor model. This model has approx 7.5 million elements. Compared with the calculation time of a single-core CPU, the anticipated calculation speed increase is approx. 4.2x when using only one GPU. When using two GPUs, it is approx. 4.6x
Analysis Time (Linear Motors)
Induction Motor Transient Response Magnetic Field Analysis
Finally, this section shows analysis times when running two steps of a transient response magnetic field analysis on an induction motor model having rotor skew. This model has approx. 9 million elements. The GPU memory for the Tesla K40 has been increased to 12 GB, which enables such a large-scale computing with a single GPU. Compared with the calculation time of a single-core CPU, the anticipated calculation speed increase is approx. 6.8x when using only one GPU. When using two GPUs, it is approx. 7.5x.
Analysis Time (Induction Motors)