战伤统计分析
02-What Ukraine’s bloody battlefield is teaching medics
05-大规模伤亡事件对医生和护士焦虑、抑郁和创伤后应激障碍的影响——一项系统审查方案。
01-Decision Support System Proposal for Medical Evacuations in Military Operations
01-军事行动中医疗后送的决策支持系统建议_1_23_translate
03-983例平民爆炸和弹道伤亡分析及伤害负担模板的生成——一项观察性研究
04-Characterization of Female US Marine Recruits- Workload, Caloric Expenditure, Fitness, Injury Rates, and Menstrual Cycle Disruption during Bootcamp
03-Analysis of 983 civilian blast and ballistic casualties and the generation of a template of injury burden- An observational study
04-美国海军陆战队女性新兵的特征——训练期间的工作量、热量消耗、体能、受伤率和月经周期中断
08-在救援现场,受害者周围环境的3D模型生成
07--估计冲突损失和报告偏差
06-EGFA-NAS- a neural architecture search method based on explosion gravitation field algorithm
05-Effects of mass casualty incidents on anxiety, depression and PTSD among doctors and nurses- a systematic review protocol.
06-EGFA-NAS——一种基于爆炸引力场算法的神经结构搜索方法
07-Estimating conflict losses and reporting biases
09-新技术应用中的精益方法——院前紧急医疗服务的风险、态势感知和复原力整合
08-Generation of 3D models of victims within their surroundings at rescue sites
10-胸腹枪伤的处理——来自南非一个主要创伤中心的经验
09-Lean approach in the application of new technologies- integration of risk, situational awareness, and resilience by a prehospital emergency medical service
10-Management of thoracoabdominal gunshot wounds – Experience from a major trauma centre in South Africa
02-乌克兰血腥的战场教给医护人员的是什么_1_1_translate
士兵跳伞造成骨科损伤的描述性研究
美国陆军部队类型的肌肉骨骼损伤发生率:一项前瞻性队列研究
军事静线空降作战中受伤的危险因素:系统回顾和荟萃分析
战伤数据库研究进展与启示
从角色2到角色3医疗设施期间战斗人员伤亡管理
美军联合创伤系统应用进展及其对我军战伤救治的启示
2014-2020年俄乌战争混合时期作战伤员膨胀子弹致结肠枪伤
关于“2001-2013年军事行动中的战斗创伤教训”的联合创伤系统更新 英文05 Joint Trauma System
创伤与伤害小组委员会 剧院创伤经验教训 英文 Theater_Trauma_Lessons_Learned
创伤和伤害小组委员会战区创伤经验教训 英文 111813Trauma and Injury Update on Theater Trauma Lessons Learned
向国防卫生委员会提交的关于“2001-2013年军事行动中的战斗创伤教训”的联合创伤系统更新 2016/8/9
战斗伤亡护理研究计划 会议材料 -
-
+
首页
06-EGFA-NAS- a neural architecture search method based on explosion gravitation field algorithm
<p>Complex & Intelligent Systems</p><p><a href="https://doi.org/10.1007/s40747-023-01230-0">https://doi.org/10.1007/s40747-023-01230-0</a></p><p><img src="/media/202408//1724856290.358969.png" /></p><img src="/media/202408//1724856290.442443.png" /><p><strong>ORIGINAL ARTICLE</strong></p><p><strong>EGFA-NAS: a neural architecture search method based on explosion</strong></p><p><strong>gravitation field algorithm</strong></p><p><a href="http://orcid.org/0009-0002-4644-3002"><strong>Xuemei Hu1</strong>c</a><strong>· Lan Huang1,2 · Jia Zeng1 · Kangping Wang1 · Yan Wang1,3</strong></p><p>Received: 3 March 2023 / Accepted: 3 September 2023 © The Author(s) 2023</p><p><strong>Abstract</strong></p><p>Neural architecture search (NAS) is an extremely complex optimization task. Recently, population-based optimization algo- rithms, such as evolutionary algorithm, have been adopted as search strategies for designing neural networks automatically. Various population-based NAS methods are promising in searching for high-performance neural architectures. The explosion gravitation field algorithm (EGFA) inspired by the formation process of planets is a novel population-based optimization algo- rithm with excellent global optimization capability and remarkable efficiency, compared with the classical population-based algorithms, such as GA and PSO. Thus, this paper attempts to develop a more efficient NAS method, called EGFA-NAS, by utilizing the work mechanisms of EGFA, which relaxes the search discrete space to a continuous one and then utilizes EGFA and gradient descent to optimize the weights of the candidate architectures in conjunction. To reduce the computational cost, a training strategy by utilizing the population mechanism of EGFA-NAS is proposed. In addition, a weight inheritance strategy for the new generated dust individuals is proposed during the explosion operation to improve performance and efficiency. The performance of EGFA-NAS is investigated in two typical micro search spaces: NAS-Bench-201 and DARTS, and compared with various kinds of state-of-the-art NAS competitors. The experimental results demonstrate that EGFA-NAS is able to match or outperform the state-of-the-art NAS methods on imageclassification tasks with remarkable efficiency improvement.</p><p><strong>Keywords </strong>Neural architecture search · Explosion gravitation field algorithm · Complex optimization task · Deep neural</p><p>networks</p><p>B Lan Huang</p><p>huanglan@jlu.edu.cn</p><p>B Yan Wang</p><p>wy6868@jlu.edu.cn</p><p>Xuemei Hu</p><p>huxm18@mails.jlu.edu.cn</p><p>Jia Zeng</p><p>zengjia22@mails.jlu.edu.cn</p><p>Kangping Wang</p><p>wangkp@jlu.edu.cn</p><p>1 College of Computer Science and Technology, Jilin University, Changchun 130012, China</p><p>2 Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China</p><p>3 School of Artificial Intelligence, Jilin University, Changchun 130012, China</p><p><strong>Introduction</strong></p><p>Deep neural networks (DNNs) have made significant progress in various challenging tasks, including image clas- sification [<a href="#bookmark1">1</a>–<a href="#bookmark2">4</a>], object detection [<a href="#bookmark3">5</a>–<a href="#bookmark4">7</a>], and segmentation [<a href="#bookmark5">8,</a> <a href="#bookmark6">9]</a>. One of the key factors behind the progress lies in the innovation of neural architectures. For example, VGGNet [<a href="#bookmark1">1]</a> suggested the use of smaller convolutional filters and stacked a series of convolution layers to achieve better performance. ResNet [<a href="#bookmark7">10</a>] introduced the residual blocks to benefit the training of deeper neural networks. DenseNet [<a href="#bookmark8">11</a>] designed the densely connected blocks to stack features from differ- ent depths. Generally, manually designing a powerful and efficient neural network architecture requires a lot of expert experiments and domain knowledge. It is not recently until a series of neural architecture search (NAS) methods have been proposed, bringing great convenience to ordinary users and learners, and allowing them to benefit from the success of deep neural networks.</p><p>published online:30september2023 1 灬</p><p>Generally, a NAS task can be regarded as a complex opti- mization problem. In machine learning and computational intelligence, population-based intelligent optimization algo- rithms, such as genetic algorithm (GA) and particle swarm optimization(PSO),have beenwidelyadopted as theconcept of neuroevolutionary, to optimize the topology structure and hyperparameters of neural networks in the late 1990 [<a href="#bookmark10">12</a>–<a href="#bookmark11">14]</a>. Recently, lots of NAS methods employing population-based intelligent optimization algorithms as search strategies have attracted increasing attention. Although intelligent optimiza- tion algorithms, such as GA, have a competitive search performance on various complex optimization tasks, they still suffer from high computational costs. This shortcom- ing is particularly true in NAS tasks since the NAS process involves a large number of architecture evaluations. More specifically, for the NAS task, each network architecture evaluation involves the completed training of a deep neu- ral network on a large amount of data from scratch. For example, Hierarchical EA [<a href="#bookmark12">15</a>] consumes 300 GPU days, and AmoebaNet-A [<a href="#bookmark13">16</a>] consumes 3150 GPU days to search architectures on the CIFAR-10.</p><p>Inaddition,reinforcementlearning(RL)isalsoadoptedto design neural architectures automatically, such as [<a href="#bookmark4">7,</a><a href="#bookmark14">17,</a><a href="#bookmark15">18]</a>. A significant limitation of RL-based NAS methods is also computationally expensive despite their remarkable perfor- mance. For example, it takes 2000 GPU days for the typical RL-based method NASNet-A to obtain an optimized CNN architecture on CIFAR-10. These methods require a large number of computational resources, which is unaffordable for most researchers and learners. To reduce the computa- tional cost, ENAS [<a href="#bookmark15">18</a>] proposed a parameter-sharing strat- egy, which shares weights among the architectures through the use of superset and is adopted in various gradient descent (GD) NAS methods, such as [<a href="#bookmark16">19</a>–<a href="#bookmark17">21]</a>. Compared with EA- based and RL-based NAS methods, GD-based NAS methods are usually more efficient, which apply gradient descent to optimize the weights of candidate architectures. However, GD-based NAS methods still have some limitations, such as requiring excessive GPU memory during the search, and resulting in premature convergence to the local optimum [<a href="#bookmark18">22,</a> <a href="#bookmark19">23]</a>.</p><p>Recently, some population-based methods, such as the various EA-based methods [<a href="#bookmark12">15,</a> <a href="#bookmark13">16,</a> <a href="#bookmark20">24</a>–<a href="#bookmark21">28</a>], have been uti- lized for NAS tasks and have achieved some progress. The explosion gravitation field algorithm (EGFA) [<a href="#bookmark22">29</a>] inspired by the formation process of planets is a novel intelligent optimization algorithm with excellent global optimization capability and remarkable efficiency, compared with the classical population-based optimization algorithms, such as GA and PSO. Nowadays, computational time and resource</p><p>limitations remain the major bottleneck in using and devel- oping NAS methods. Thus, this paper attempts to develop a more efficient NAS method, by utilizing the work mecha- nisms of EGFA, to allow for discovering an optimal neural architecture with competitive learning accuracy, but only consuming a little computational time and resources. Specif- ically, the proposed EGFA-NAS utilizes EGFA and gradient descenttooptimizetheweights ofthecandidate architectures in conjunction. To reduce the computational cost, EGFA- NAS proposes a training strategy by utilizing the population mechanism of EGFA-NAS. To improve the efficiency and performance, EGFA-NAS proposes the weight inheritance strategy for the new generated dust individuals during the explosion operation. The main contributions of this paper are summarized as follows.</p><p>1. A novel population-based NAS method is proposed, called EGFA-NAS, which utilizes EGFA and gradient descent to optimize the weights of candidate architecture jointly, and is applicable to any universal micro search space with a fixed number of edges and a determined candidate operations set, such as NAS-Bench-201 and DARTS search space.</p><p>2. A training strategy is proposed to reduce the compu- tational cost by utilizing the population mechanism. Specifically, all dust individuals cooperate to complete the training of the dataset at each epoch. Although each dust individual is only trained on part of batches at each epoch,itwillbetrained on allbatches over a large number of epochs.</p><p>3. A weightinheritanceisproposedtoimprove performance and efficiency. Specifically, during the explosion opera- tion, the weights w of each new generated dust individual are inherited from the center dust. By utilizing this strat- egy, the new generated can be evaluated directly at the current epoch without retraining.</p><p>4. The experimental results show that the optimal neu- ral network architectures searched by EGFA-NAS have competitive learning accuracy and require the least computational cost, compared with four kinds of state- of-the-art NAS methods.</p><p><img src="/media/202408//1724856290.587024.png" />The remainder of the paper is organized as follows. “<a href="#bookmark23">Related work</a>” introduces the related work of the work. “<a href="#bookmark24">Proposed NAS method</a>” describes the details of this pro- posed NAS method. The experimental design and results <a href="#bookmark9">are presented in “</a><a href="#bookmark25">Experimental design</a><a href="#bookmark9">” and “Experimental results”, respectively. The final part is the conclusion placed </a>in “<a href="#bookmark26">Conclusion</a>” .</p><p><a id="bookmark23"></a><strong>Related work</strong></p><p><strong>General formulation of NAS task</strong></p><p>NASis an extremely complex optimization task, the primary objective of which is to transform the process of manu- ally designing neural networks into automatically searching for optimal architectures. The process of the NAS can be depictedinFig.<a href="#bookmark27">1</a>. Duringthesearch,the searchstrategy sam- ples a candidate architecture from the search space. Then we train the architecture to converge and evaluate the architec- ture’s performance. Next,thesearchstrategypicksup another candidate architecture for training and evaluation according to the evaluation result of the last architecture.</p><p>In NAS tasks, denote a neural network architecture as <em>A </em>and the weights of all functions of the neural network as w<em>A </em>. Then the goal of NAS is to find an architecture <em>A</em>, which can achieve the minimum validation loss <em>L</em>V after being trained by minimizing the training loss <em>L</em>T, as shown in Eq. <a href="#bookmark28">( 1)</a>.</p><p>min <em>L</em>V(w<em> </em>, <em>A</em>)</p><p><em>A</em></p><p>s.t. w<em> </em>= arg min <em>L</em>T (w<em>A </em>, <em>A</em>),</p><p><a id="bookmark28"></a>(1)</p><p>w</p><p>where w<em> </em>is the best weight of <em>A </em>and achieves the minimum</p><p>loss of training dataset. <em>L</em>T and <em>L</em>V are the losses on train- ing dataset and validation dataset, respectively. Both losses are determined not only by the architecture <em>A</em>, but also the weights w. This is a bi-level optimization problem [<a href="#bookmark29">30</a>] with <em>A </em>as the upper-level variable and w as the lower-level vari- able.</p><p><strong>NAS methods</strong></p><p>Search strategy determines how to sample the neural net- workarchitectures. Accordingtothedifferentkindsofsearch strategy, NAS methods can be roughly divided into three cat- egories: EA-based NAS methods, RL-based NAS methods, GD-based NAS methods.</p><p><strong>EA-based NAS methods</strong></p><p>EA-based NAS methods use evolutionary algorithms (EAs) to sample neural architectures. Early EA-based research for the optimization of networks was proposed as the concept of neuroevolutionary [<a href="#bookmark10">12</a>–<a href="#bookmark11">14</a>], which not only optimizes the network’s topology but also optimizes the hyperparameters andconnection weights associatedwiththenetwork. Overthe past years, EA-based NAS methods have attracted increas- ing attention. For example, Xie et al. published the first EA-based NAS work GeNet [<a href="#bookmark30">31</a>] in 2017, which encodes the candidate architectures using fixed-length binary strings. Realetal. searched network architectures byEA, and started</p><p>searching from trivial initial conditions [<a href="#bookmark31">27]</a>. Subsequently, Real etal. evolved an imageclassifier: AmoebaNet-A [<a href="#bookmark13">16</a>], which modifies the tournament selection by introducing a concept of age and surpasses hand designs for the first time. Liu et al. proposed Hierarchical EA [<a href="#bookmark12">15</a>], which combines a novel hierarchical genetic representation scheme that imi- tates the modularized design pattern and expressive search space. Elsken etal. proposed the LEMONADE [<a href="#bookmark20">24</a>], which is an evolutionary algorithm for multi-objective architecture search. Suganuma et al. constructed CNN architecturesbased on Cartesian genetic programming (CGP) [<a href="#bookmark32">25]</a>. Sun et al. proposed CNN-GA [<a href="#bookmark33">26</a>] and AE-CNN [<a href="#bookmark34">32</a>], which evolves CNN architecturesusingGA,based on ResNet andDenseNet blocks. To accelerate the fitness evaluation in evolutionary deep learning, Sun and Wang et al. proposed an end-to-end offline performance predictor based on the random forest <a href="#bookmark35">[33]</a>.</p><p>Although the neural network architectures searched by above EA-based NAS methods have achieved competitive performance compared with the state-of-the-art handle- designed CNNs, however, as the population-based methods, they still suffer from huge resource costs because of involv- ing a large number of fitness evaluations. During the search phase, each new generated candidate architecture needs to be trained on a training dataset and evaluated on a val- idation dataset. Then most EA-based NAS methods are time-consuming. For example, to search architectures on the CIFAR-10 dataset, Hierarchical EA [<a href="#bookmark12">15</a>] needs 300 GPU days, AmoebaNet-A [<a href="#bookmark13">16</a>] needs 3150 GPU days, CNN-GA <a href="#bookmark33">[26</a>] needs 35 GPU days, and AE-CNN [<a href="#bookmark34">32</a>] needs 27 GPU days. Then it is essential to accelerate the evaluation process for EA-based NAS methods, especially under the condition of limited computational resources.</p><p><strong>RL-based NAS methods</strong></p><p>The agent, environment, and reward are the three factors of reinforcementlearning(RL). Inthecontext ofNAS, sampling the network architectures from the search space by the con- troller is defined as the action of the agent, the performance of network is regarded as the reward, and the controller is updated based on the reward in the next iteration. The ear- liest RL-based NAS method was proposed by Zoph et al. in 2017, which used RNNs as controllers to sample the net- work architecture and generate actions via policy gradients <a href="#bookmark4">[7]</a>. Subsequently, Zophetal. used a proximal optimization strategy to optimize the RNN controller [<a href="#bookmark14">17]</a>. Cai et al. pre- sented a RL-based algorithm: ProxylessNAS [<a href="#bookmark36">34</a>], which is an alternative strategy to handle hardware metrics. Block- QNN [<a href="#bookmark37">35</a>] automatically builds high-performance networks using the Q-Learning paradigm with epsilon-greedy explo- ration strategy.</p><p>sample architecture</p><p><strong>Fig. 1 </strong>Process of neural <a id="bookmark27"></a>architecture search</p><p><img src="/media/202408//1724856290.6924102.png" /></p><table><tr><td rowspan="2"><p><img src="/media/202408//1724856290.7155552.png" /></p><p><img src="/media/202408//1724856290.768857.png" /></p></td><td><p>from search space</p></td><td rowspan="2"><p><img src="/media/202408//1724856290.972533.png" /></p><p><img src="/media/202408//1724856291.043036.png" /></p></td></tr><tr><td><p>Re tum performance</p></td></tr></table><p>evaluation</p><p>Earlier RL-based NAS methods are usually computa- tionally expensive. To reduce the computational cost, work <a href="#bookmark14">[ 17</a>] proposed the well-known NASNet search space, which allows us to search the best cell on the CIFAR-10 dataset</p><p>and then apply this cell to the ImageNet dataset by stack- ing together more copies of this cell. ENAS [<a href="#bookmark15">18</a>] proposed a parameter-sharing strategy andthe one-shotestimator(OSE), which regards all candidate architectures as the subgraphs of the super-network. Thenallcandidate architectures can share</p><p>the parameters.</p><p><strong>GD-based NAS methods</strong></p><p>Recently, there is an increasing interest in adopting gradient descent (GD) methods for NAS tasks. A typical GD-based NAS method is DARTS [<a href="#bookmark16">19</a>], which optimizes the network architecture parameters by GD methods after converting the discrete search space into a continuous search space through a relaxation strategy. Subsequently, Dong et al. proposed the GDAS [<a href="#bookmark38">20</a>], which develops a learnable differentiable sam- pler to accelerate the search procedure. Xie et al. proposed the SNAS [<a href="#bookmark17">21</a>], which trains neural operation parameters andarchitecturedistributionparametersbyproposing a novel search gradient. Above-mentioned ProxylessNAS [<a href="#bookmark36">34</a>] pro- posed a gradient-basedapproachtohandle non-differentiable hardware objectives.</p><p>Compared with EA-based and RL-based NAS meth- ods, GD-based NAS methods are every efficient, because they represent the structures of the candidate networks as directed acyclic diagrams (DAGs) and use the parameter- sharing strategy. However, GD-based NAS methods have some drawbacks. For example, references [<a href="#bookmark18">22,</a> <a href="#bookmark19">23</a>] point out that the DARTS tends to select skip-connection oper- ations, which leads to performance degradation of searched architectures. To overcome the shortcoming of DARTS [<a href="#bookmark16">19</a>], several variants of DARTS methods have been proposed, such as DARTS- <a href="#bookmark39">[36</a>], DARTS + [<a href="#bookmark40">37</a>], RC-DARTS [<a href="#bookmark41">38</a>], and β-DARTS [<a href="#bookmark42">39]</a>.</p><p>Besides the above three kinds of NAS methods, there are also other NAS methods that are not mentioned or do not fully fall into the above categories. For example, Liu et al.</p><p>proposed the PNAS [<a href="#bookmark43">40</a>], which uses a sequential model- based optimization (SMBO) strategy.</p><p><strong>Explosion gravitation field algorithm</strong></p><p>Explosion gravitation field algorithm (EGFA) [<a href="#bookmark22">29</a>] is a novel optimization algorithm based on the original GFA [<a href="#bookmark43">40</a>–<a href="#bookmark44">43</a>], which stimulates the formation process of planets based on SNDM [<a href="#bookmark45">44]</a>. It was proposed by our research team in 2019 and has achieved good performance when solving optimiza- tion problems and tasks, such as benchmark functions [<a href="#bookmark22">29]</a> and feature selection tasks [<a href="#bookmark46">45]</a>. Compared with the classical population-based intelligent algorithm, such as geneticalgo- rithm (GA) and particle swarm optimization (PSO), EGFA has better global optimization capability and remarkable effi- ciency. Inaddition,thefactthatEGFA convergestothe global best solution with probability 1 under some conditions has been proven [<a href="#bookmark22">29]</a>.</p><p>In EGFA, all individuals can be mimicked as dust parti- cles with mass, and each of them belong to a certain group. In every group, the one with the maximum mass value is regarded as the center dust and the others are surrounding dust particles. Based on the idea of SNDM [<a href="#bookmark45">44</a>], each cen- ter dust attracts its surrounding dust by the gravitation field, and the gravitation field makes all surrounding dust particles move toward their centers. In EGFA, each dust particle can be represented by a four-tuple (location, mass, group, flag), where flag is a Boolean value indicating whether it is a cen- ter, location corresponds to a solution for the problem, group indicates the group number, mass is the value of objective function. When the value of mass is bigger, the solution is better. There are six basic operations forEGFA as Fig. <a href="#bookmark47">2</a>: (1) dust sampling (DS), (2) initialize, (3) group, (4) move and rotate, (5) absorb, and (6) explode. The detailed processes of EGFA are summarized as follows:</p><p>Step1:Subspacelocationby dust sampling(DS).The task of DS is to efficiently locate a small enough search space which more likely contains the optimal solution.</p><p>Step 2: Initialize the dust population randomly based on the subspace located by Step 1.</p><p>Step 3: Divide the dust population into several subgroups randomly, and calculate the mass value of all individuals. In each group, set the dust particle with maximum mass value</p><p><img src="/media/202408//1724856291.1420312.png" /></p><p>Dust sampling</p><p><img src="/media/202408//1724856291.194164.png" /></p><p><img src="/media/202408//1724856291.2059789.png" /><img src="/media/202408//1724856291.215571.png" /></p><p>Move and Rotate</p><p><img src="/media/202408//1724856291.230619.png" /></p><p>yes</p><p>Explode</p><p><img src="/media/202408//1724856291.254837.png" /></p><p><a id="bookmark47"></a><strong>Fig. 2 </strong>Flow chart of EGFA</p><p>as the center, and set its flag as 1; set the other individuals as the surrounding dust particles, set their flag as 0.</p><p>Step 4: Check the stop condition. If the stop condition is met, return the best solution and the algorithm terminates, otherwise goes to Step 5.</p><p>Step 5: Perform the movement and rotation operation. In each group, each center attracts its surrounding dust particles by the gravitation field, and the gravitation field makes all surrounding dust particles move toward their centers.</p><p>Step 6: Perform the absorbing operation. Some surround- ing dust particles which are close to their centers enough are absorbed by the centers. The size of the dust population will decrease in this process.</p><p>Step 7: Perform the explosion operation, and some new dust particles are generated around the centers. When explo- sion operation is accomplished, algorithm goes to Step 4.</p><p>In addition, DS in Step 1 avoids a long iterative process because the algorithm only searches in the subspace which is small enough compared with the original search space. The explosion operation maintains the size of the population and can stop the algorithm from being in stagnation behavior because of falling into local optima.</p><p>In this work, we proposed a NAS method based on explo- sion gravitation field algorithm, EGFA-NAS for short. In EGFA-NAS, an individual (adust particle) represents a can- didate network architecture. EGFA-NAS aims to discover a network architecture with the best performance, such as accuracy on the testing dataset. For the NAS task, the sub- space small enough that contains the best architecture is hard</p><p>to locate and computationally intensive. Therefore, EGFA- NAS abandons the first operation DS. As a population-based method for the NAS task, there are several key issues to be addressed. Namely, (1) which type of search space to search, (2) how to represent and code a CNN network, (3) how to accelerate the network architecture evaluation process, (4) <a id="bookmark24"></a>how to use heuristic information to guide the search process.</p><p><strong>Proposed NAS method</strong></p><p>Micro search spaces, such as NASNet[<a href="#bookmark14">17</a>], DARTS[<a href="#bookmark16">19</a>], and NAS-Bench-201 [<a href="#bookmark19">23</a>] search spaces, are popularly utilized for NAS tasks recently, which search for the neural cells to form the blocks and construct the macro skeleton of network bystackingmultipleblocksmultipletimes as[<a href="#bookmark13">16</a>–<a href="#bookmark38">20,</a><a href="#bookmark19">23,</a><a href="#bookmark48">46]</a>. In this work, we propose an efficient NAS method for micro search space. To investigate the performance of our proposed method sufficiently, we choose two classical micro search spaces: i.e., NAS-Bench-201 and DARTS search space to test.</p><p><a id="bookmark49"></a><strong>Representation of search space</strong></p><p>In this work, we search for a computation cell as the build- ing block of the final architecture and represent a cell as a directed acyclic diagram (DAG). Specifically, a node repre- sents the information flow, e.g., a feature map in CNNs, and an edge between two nodes donates the candidate operation, which is known as successful modules designed by human experts. We denoted <em>O </em>as the candidate operations set. To process the intermediate nodes more efficiently in the for- ward propagation, two kinds of cells need to be searched: normal cell with stride of 1 and reduction cell (block) with stride of 2. Once the two kinds of cells are identified, we can stack multiple copies of the searched cell to make up a wholeneuralnetwork. Intherestofthissection, we introduce the two search spaces: NAS-Bench-201 and DARTS search space, respectively.</p><p><strong>NAS-Bench-201</strong></p><p>NAS-Bench-201 was proposed by Donget al. <a href="#bookmark19">[23</a>], which is an algorithm-agnostic micro search space. Specifically, a cell from NAS-Bench-201 includes one inputnode,three compu- tational nodes, the last computational node is also the output node for next cell. Every edge in a cell has five candidate options. Then a cell in NAS-Bench-201 can be represented as a DAG, the nodes of which are connected fully, and there</p><p>is 5<em>C</em> = 15, 625 cell candidates in total. In NAS-Bench-</p><p>201, the candidate operations set <em>O </em>contains the following FIVE operations: (1) zeroize, (2) skip-connection, (3) 1 × 1</p><p>convolution, (4) 3 × 3 convolution, and (5) 3 × 3 average pooling.</p><p>As shown in Fig. <a href="#bookmark50">3,</a> the macro skeleton of NAS-Bench- 201 is mainly stacked by three normal blocks and connected by two reduction blocks. Each normal block consists of <em>B </em>normalcells. Thereductionblockisthebasicreductionblock <a href="#bookmark7">[ 10</a>], which serves to down-samplethe spatial sizeanddouble the channels of an input feature map. The skeleton is initiated with one 3×3 convolution, and ends up with a global average pooling layer to flatten the feature map into a feature vector. Additionally, work [<a href="#bookmark19">23</a>] evaluates each candidate archi- tecture for NAS-Bench-201 on three different datasets: CIFAR-10, CIFAR-100 [<a href="#bookmark51">47</a>], and ImageNet-16-120 [<a href="#bookmark52">48]</a>.</p><p>Then once the final architecture is found, theretraining pro- cess is not essential, and we can directly obtain the network final performance by the API provided by [<a href="#bookmark19">23]</a>.</p><p><strong>DARTS search space</strong></p><p>DARTS [<a href="#bookmark16">19</a>] search space is a popular micro search space, proposed by Liu et al. in 2019, which is similar to NASNet <a href="#bookmark14">[ 17</a>] search space but removes some unused operations and adds some powerful operations. Specifically, a cell from the DARTS search space contains two input nodes, four compu- tational nodes, and one output node. The output node is the concatenation of four computational nodes. As the depiction in Fig.<a href="#bookmark53">4</a>, there are 14 edges in a cell for search, and each edge has 8 options. Unlike theNAS-Bench-201, the nodes in a cell are not connected fully during the search phase. Moreover, during the evaluation phase, each node only connects with two previous nodes. In DARTS search space, the candidate operations set <em>O </em>contains the following eight operations: (1) identify, (2) zeroize, (3) 3 × 3 depth-wise separate convo- lution, (4) 3 × 3 dilated depth-wise separate convolution, (5) 5 × 5 depth-wise separate convolution, (6) 5 × 5 dilated depth-wise separate convolution, (7) 3 × 3 average pooling, (8) 3 × 3 max pooling.</p><p>As shown in Fig. <a href="#bookmark53">4,</a> <em>B </em>normal cells are stacked as one normal block. For a given image, it forwards thought a 3 × 3 convolution and then forwards thought three normal blocks with two reduction cells in between. In this paper, we follow the work [<a href="#bookmark16">19</a>] to set up the overall network architecture of DARTS search space.</p><p><strong>Overall of search process</strong></p><p>Figure <a href="#bookmark54">5</a> shows the overall of search process in EGFA-</p><p>NAS. (a) Operations on edges are initialized unknown. (b) Continuous relaxation of search space and sampling the can- didate operations for the edges with the mix probabilities. (c) Optimize the mix probabilities and the weights of cells simultaneously. (d). Inferring the final structure of cell from the learned mixing probabilities</p><p><strong>Representation and encoding of cell</strong></p><p><a href="#bookmark49">As discussed in “Representation of search space</a>”, the cells to search in this work can be represented by the DAGs. Specif- ically, each computational node represents one feature map, which is transformed from the previous feature map. Each edgeinthisDAGisassociatedwith an operationtransforming the feature map from one node to another node. All possi- ble operations are selected from a candidate operation set <em>O </em>. Then the output of any node <em>j </em>can be formulated as Eq. <a href="#bookmark55">(2)</a>.</p><p><a id="bookmark55"></a><em>Ij </em>= Σ<em>i</em><<em>j oi </em>, <em>j</em>(<em>Ii</em>), (2)</p><p>where <em>Ii </em>and <em>Ij </em>represent the output of the node <em>i </em>and node <em>j </em>, respectively. <em>oi </em>, <em>j </em>represents the operation transforming the feature map from node <em>i </em>tonode <em>j</em>, which is selected from the candidate operation set O.</p><p>In NAS-Bench-201 [<a href="#bookmark19">23</a>], a normal cell contains four nodes,i.e., {<em>Ii</em>|0 ≤ <em>i </em>≤ 3}. <em>I</em>0 is the output tensor of the pre- vious layer, <em>I</em>1 , <em>I</em>2 , <em>I</em>3 are the output tensors of node 1, 2, and 3, calculated by Eq. <a href="#bookmark55">(2)</a>. According to work [<a href="#bookmark19">23</a>], a nor- mal cell contains six edges and each edge has five candidate operations.</p><p>In DARTS search space, a cell contains seven nodes,i.e., {<em>Ii</em>|0 ≤ <em>i </em>≤ 6}. <em>I</em>0 and <em>I</em>1 are the input tensors, <em>I</em>2 , <em>I</em>3 , <em>I</em>4 and <em>I</em>5 are the output tensor of node 2, 3, 4, and 5. <em>I</em>6 indicates the output of this cell, which is the concatenation of the four computational nodes,i.e., <em>I</em>6 = <em>I</em>2 ∩ <em>I</em>3 ∩ <em>I</em>4 ∩ <em>I</em>5 .</p><p>Define <em>e </em>as the number of edges for a cell, |<em>O</em>| represents the size of the candidate operations set <em>O </em>. According to the above description of NAS-Bench-201 and DARTS search space, a cell can be encoded as <em>A </em>with size of <em>e </em>× |<em>O</em>| . In NAS-Bench-201, <em>e </em>= 6, |<em>O</em>| = 5, <em>A </em>is a tensor with size of 6 × 5. In DARTS search space, <em>e </em>= 14, |<em>O</em>| = 8, <em>A </em>is a tensor with size of 14 × 8. A general representation for a cell is formulated as Eq. <a href="#bookmark56">(3)</a>.</p><table><tr><td><p><em>A </em>=</p></td><td><p>「</p><p>I I I I I I I I I I I \</p></td><td><p><img src="/media/202408//1724856291.368602.png" />0 7 「</p><p><img src="/media/202408//1724856291.412488.png" />.1 <img src="/media/202408//1724856291.4176059.png" /> = <img src="/media/202408//1724856291.420068.png" /> <img src="/media/202408//1724856291.422949.png" /><em>p </em><img src="/media/202408//1724856291.426388.png" /><em> </em><img src="/media/202408//1724856291.431987.png" /></p><p><img src="/media/202408//1724856291.435817.png" /></p></td><td><p><em>a</em> , <em>a</em> , ··· , <em>a</em> , ··· , <em>a</em>0|<em>O</em>|−1 7</p><p><em>a</em> , <em>a </em> , ··· , <em>a</em> , ··· , <em>a</em>1|<em>O</em>|−1 <img src="/media/202408//1724856291.4446871.png" /></p><p>I</p><p>. . . . I</p><p>.. .. ··· .. ··· .. I</p><p>I</p><p><em>a </em>, <em>a </em>, ··· , <em>a </em>, ··· , <em>ap</em>|<em>O</em>|−1 <img src="/media/202408//1724856291.452548.png" /> ,</p><p>I</p><p>. . . . I</p><p>.. .. ··· .. ··· .. I</p><p><em>a</em>−1 , <em>a</em>−1 , ··· , <em>a</em>−1 , ··· , <em>ae</em>| <img src="/media/202408//1724856291.459757.png" />|1−1</p><p>|</p><p><a id="bookmark56"></a>(3)</p></td></tr></table><p><img src="/media/202408//1724856291.463386.png" /></p><p>candidate operations for edge <em>p</em>, <em>a </em>is the <em>q </em><img src="/media/202408//1724856291.478431.png" /><em>p </em>and represents the probability of sampling the <em>q </em>th candidate operation for edge <em>p</em>. In fact, the way of encoding for cell as</p><p><img src="/media/202408//1724856291.482305.png" /></p><p><strong>Fig. 3 </strong>Macro skeleton of NAS-Bench-201</p><p><img src="/media/202408//1724856291.486806.png" /></p><p><strong>Fig. 4 </strong>Macro skeleton of DARTS search space</p><p><img src="/media/202408//1724856291.4908428.png" /></p><p><strong>Fig. 5 </strong>Overall of search process</p><p><a id="bookmark50"></a><a id="bookmark53"></a><a id="bookmark54"></a>Eq. <a href="#bookmark56">(3</a>) can be used for any micro search space, the searched cell of which has a fixed number of edges <em>e </em>and a defined candidate operations set <em>O </em>.</p><p>node <em>j </em>to node <em>i</em>) that selects the <em>k</em>th candidate operation as the transformation function, <em>ok </em>represents the <em>k</em>th candidate operation,<em>Ij </em>is the output of node <em>j </em>, w<em>k</em>(<em>i</em>←<em>j</em>) is the weight for the function of <em>ok </em>on edge(<em>i</em>←<em>j</em>) . To make the search space continuous, we relax the probability of a particular operation α<em>k</em>(<em>i</em>←<em>j</em>) to a softmax over all possible operations by Eq. <a href="#bookmark58">(5)</a>.</p><p><a id="bookmark58"></a><img src="/media/202408//1724856291.500273.png" /></p><p>where <em>ck </em>are i.i.d that samples from Gumble(0, 1), <em>ck </em>= −<em>log</em>(−<em>log</em>(<em>u</em>)) with u ~ Unif[0, 1]. τ is a softmax temper- ature; in this work, τ is set 10 as same as study [<a href="#bookmark19">23]</a>.</p><p><strong>Training strategy</strong></p><p>In this work, we aim to reduce the computational cost by uti- lizing the population mechanism of EGFA-NAS. The main</p><p><strong>Continuous relaxation of the search space</strong></p><p><a href="#bookmark49">As described in “Representation of search space</a>”, a neu- ral network architecture consists of many copies of the cell. These cells are sampled from the NAS-Bench-201 and DARTS search space. Specifically, from node <em>j </em>to node <em>i </em>, we sample the transformation function from the candidate operation set <em>O </em>with a discrete probability α (<em>i</em>←<em>j</em>) . During the search, we calculated each node in a cell by Eq. <a href="#bookmark57">(4)</a>.</p><p><a id="bookmark57"></a><img src="/media/202408//1724856291.519548.png" /></p><p>where |<em>O</em>| is the number of candidate operation of the set <em>O </em>, α<em>k</em>(<em>i</em>←<em>j</em>) represents the probability for the edge(<em>i</em>←<em>j</em>) (from</p><p><img src="/media/202408//1724856291.527701.png" /><strong>Fig. 6 </strong>Training strategy for EGFA-NAS</p><p>idea of the training strategy is illustrated as Fig. <a href="#bookmark60">6.</a> Specif- ically, define <em>DT </em>as the training dataset, batch_num as the number of batches of <em>DT </em>, <em>n </em>as the population size. At each epoch, each dust individual is training on <em>k </em>batches, where <em>k </em>=「batch_num/<em>n</em>l. All dust individuals cooperate to com- plete the training of the dataset at each epoch. This training process repeats until the maximum number of epochs is reached. Each dust individual (architecture network) will be trained on many differentbatchessincethenumberofbatches batch_num is usually larger than the population size <em>n </em>and the training process is repeated for a large number of epochs. In this work, set batch_num = 98, <em>n </em>= 20, <em>k </em>= 5 for the CIFAR-10, andsetthemaximumnumberofepochs as80and 200 for NAS-Bench-201 and DARTS search space, respec- tively. Although each dust individual (architecture network) is trained only on a subset (1/<em>n</em>training data) at each epoch, it will be trained on all training data over a large number of epochs by this training strategy.</p><p><img src="/media/202408//1724856291.544557.png" />In addition, due to the facts that each dust individual is responsible for part of the training work, and the com- plete training of each epoch is done with the participation of all individuals, therefore the efficiency of EGFA-NAS is not sensitive to the setting of population size <em>n</em>, which <a href="#bookmark59">will be experimentally confirmed in “Parameter settings for NAS-Bench-201” .</a></p><p><a id="bookmark60"></a><strong>Explosion operation and weights inheritance</strong></p><p>In the context of neural architectural search, a dust in EGFA-NAS represents a candidate architecture and not only maintains the original four attributes: location, mass, group number, and a Boolean flag indicating whether it is a center as description in 2.3, but also maintains an attribute “w” to record the weights of functions in cells. Each dust particle can be represented by a five- tuple (location, w, mass, group, flag). In EGFA-NAS, the location is denoted as the operations mixing probabil- ity <em>A</em>, then a neural network architecture can be represented as (<em>A</em>, w, mass, group, flag).</p><p>As a population-based NAS method, the main compu- tational bottleneck of EGFA-NAS is involving a lot of evaluation of architectures. Inthis work, we attemptto reduce the computational cost by taking advantage of the working mechanism of EGFA. At each epoch, additional computa- tional cost is caused because a number of new generated dust particles (architectures) need to be trained during the explo- sion operation. On the other hand, the new dust particles are generated based on the center dust, and there are closerela- tionships between the new generated dust particles and their center. Based on the above two observations, we proposed a weight inheritance strategy during explosion operation. The detail of explosion operation in EGFA-NAS is described in Algorithm 1.</p><p><img src="/media/202408//1724856291.5631711.png" /></p><p><img src="/media/202408//1724856291.566999.png" />Algorithm 1 Explosion operation</p><p>Input: the size of dust population n , the absorptivity abs ,the number of epochs epochmax,the maximum radius Tpinax and minimum radius rmin , current epoch epochcunr, dust population Dustabsorb , the center dust center ,new generated dustpopulation Duste-D ·</p><p><img src="/media/202408//1724856291.571929.png" />output:the dustpopulation Dustexplode</p><p><img src="/media/202408//1724856291.579873.png" /></p><p>2. for each group do</p><p>3. for each ncw generated individual dust, do 4. dust.<s>A=center.A*(1-r)+A </s> *r</p><p><img src="/media/202408//1724856291.617593.png" /></p><p>6. Dust Dust dust i</p><p>7. end for 8. For</p><p>9. for each individual dusti in Dustne, .do</p><p><img src="/media/202408//1724856291.6532328.png" />10. construct architecturebased on parametersAW Of dusti</p><p>11. L, and L</p><p>V</p><p><img src="/media/202408//1724856291.699084.png" /></p><p><img src="/media/202408//1724856291.708117.png" /></p><p>14. end for</p><p><img src="/media/202408//1724856291.7397382.png" /></p><p>exp<s> </s></p><p>16. for each group do</p><p><img src="/media/202408//1724856291.761871.png" /></p><p>18. end for</p><p><img src="/media/202408//1724856291.7944078.png" />19. Return Dust.sn</p><p>lode</p><p>As shown in Algorithm 1, the first part (lines 1–8) is the process of generating new individuals based on the center dust. Themixprobabilities <em>A </em>ofcandidateoperationsof<em>dusti </em>are computed as line 4, the weights w of functions in cells are inherited from the center dust as line 5. The second part (lines 9–14) calculates the mass value for new generated dust particle, and update the parameter w. Line 15 combines the dust population <em>Dustabsorb</em>(output of previous process) and the new generated dust population <em>Dustne</em>w . The last part (lines 16–18) updates the center dust for each group. By uti- lizing the weight inheritance, the new generated dust can be evaluated directly at the current epoch without retraining.</p><p>Figure <a href="#bookmark61">7</a> illustrates the process of generating new dust particlesby means ofweightinheritanceduringthe explosion operation. <img src="/media/202408//1724856291.806011.png" /> candidate operations for edge <em>i </em>, w<em>i </em>records the weights of functions for edge <em>i</em>. The right partition in Fig. <a href="#bookmark61">7</a> shows the</p><p>new generated dust population with size of <em>m</em>, the mixing probability <em>A </em>of new dust particles is based on their center as line 4 in Algorithm 1, and parameters w are inherited from their center dust particle as line 5 in Algorithm 1.</p><p><strong>Process of EGFA-NAS</strong></p><p>As described above, during the process of NAS, the two parameters: architecture <em>A </em>and weight w need to be opti- mized meanwhile. To solve the bi-level optimization prob- lem, we divide the original training dataset into two parts: the new training dataset <em>D</em>T and the validation dataset <em>D</em>V, then use the new training dataset <em>D</em>T to optimize the parameter w, use the validation dataset <em>D</em>V to optimize the parameter <em>A</em>. In EGFA-NAS, we apply the EGFA and gradient descent</p><p><img src="/media/202408//1724856291.821991.png" /><strong>Fig. 7 </strong>Process of generating new</p><p><a id="bookmark61"></a>dust particles by weight</p><p>inheritance during explosion operation</p><p>jointly to optimize the parameter wand architecture <em>A </em>mean- while in an iterative way. The processes of EGFA-NAS are described in detail as follows:</p><p>Step 1: Initialize all parameters, including the size of dust population <em>n</em>, the number of group <em>g</em>, the absorptivity abs for absorb operation, the number of epochs <em>epoch</em>max , the maximum radius <em>r</em>max and minimum radius <em>r</em>min for explosion strategy; initialize the dust population <em>Dust </em>= {<em>dust</em>0 , <em>dust</em>1 , ··· , <em>dustn</em>−1} randomly. For each <em>dusti</em>, the location (the <em>i</em>th cell architecture <em>dusti</em>.<em>A</em>) is initialized ran- domly, whichis a <em>e</em>×|<em>O</em>| tensor asEq.(<a href="#bookmark56">3</a>). Afterinitialization, each cell can be stacked into a neural network. Then the loss on training dataset <em>L</em>T and the loss on validation dataset <em>L</em>V can be calculated. To optimize the two parameters w and <em>A </em>meanwhile, we use Eq. <a href="#bookmark62">(6</a>) to evaluate the performance of network architecture, and denote Eq. <a href="#bookmark62">(6</a>) as the mass value of <em>dusti </em>. It is noted that the <em>L</em>T and <em>L</em>V are not the loss of network architecture after full training, but the loss on the training dataset and the validation dataset at the current epoch, respectively.</p><p><a id="bookmark62"></a><em>dusti </em>.<em>mass </em>= <em>L </em> + <em>L </em>, (6)</p><p>where the losses <em>L</em>T and <em>L</em>V are calculated by Eq. <a href="#bookmark63">(7</a>), which are the cross-entropy loss functions [<a href="#bookmark64">49]</a>.</p><p><a id="bookmark63"></a><img src="/media/202408//1724856291.8679428.png" /></p><p>where <em>x </em>represents the data sample, <em>y </em>is the true label, <em> </em>represents the predicted label, and <em>s </em>is the size of data.</p><p>Step 2: Divide the dust population into <em>g </em>subgroups. In EGFA-NAS, the value of <em>g </em>is set as 2; set the dust particle with maximum mass as the center dust, and the others are the surrounding dust particles. For <em>dusti</em>, the attribute <em>flag </em>is set as Eq. <a href="#bookmark65">(8</a>), where <em>best</em>_<em>mass j </em>is the maximum mass value in group <em>j</em>.</p><p><a id="bookmark65"></a><img src="/media/202408//1724856291.8829818.png" /></p><p>Step 3: Check the termination conditions. There are two termination conditions in EGFA-NAS, one is the maximum epochs, the other one is the average change of mass value of dust population. Once one condition is met, the main loop of EGFA-NAS ends. Then return the optimal network archi- tecture <em>A </em>and deduce the structure of the neural network; otherwise, go to Step 4.</p><p>Step4: Perform themovement and rotation operation. The surrounding dust particles move toward the center dust. For each dust particle <em>dusti</em>, the pace of movement is calculated <a id="bookmark66"></a>by Eq. <a href="#bookmark66">(9)</a>.</p><p>△<em>A</em>1 = <em>p </em>∗ (exp (<em>center</em>.<em>A </em>+3) − exp (<em>dusti</em>.<em>A </em>+3))</p><p>+ <em>q </em>∗ <em>A</em>random , (9)</p><p>where <em>center</em>.<em>A </em>presents the cell structure of the center dust; <em>dusti</em>.<em>A </em>represents the <em>i</em>th cell structure; <em>A</em>random is a 6 × 5 tensor generated randomly. <em>p </em>is the pace of movement, <em>q </em>is a value close to zero. In this work, we set <em>p </em>= 0.1, <em>q </em>= 0.001, respectively. We denote the pace of the movement and rotation operation on the location of <em>dusti </em>as △<em>A</em>1 . In addition, in EGFA-NAS, we also apply the gradient descent to optimize the parameters: <em>A </em>and w. We denote the pace of gradient descent on the location of <em>dusti </em>as △<em>A</em>2, which is calculated by Eq. <a href="#bookmark67">( 10)</a>.</p><p><a id="bookmark67"></a>△<em>A</em>2 = −ξ2 ∇<em>dusti</em>.<em>ALV</em>(<em>dusti</em>.w, <em>dusti</em>.<em>A</em>), (10)</p><p>whereξ2 isthelearningrate,∇<em>dusti</em>.<em>ALV </em>representsthearchi- tecture gradient on validation dataset.</p><p>As shown in Fig. <a href="#bookmark68">8</a>, considering the impacts of the above two factors on the cell structure <em>A</em>, the location of <em>dusti </em>is updated as Eq. <a href="#bookmark69">( 11)</a></p><p><a id="bookmark69"></a><em>dusti</em>.<em>A </em>= <em>dusti</em>.<em>A </em>+ △<em>A</em>1 + △<em>A</em>2 . (11)</p><p>During this process, for each dust particle <em>dusti</em>, we not only need to optimize the parameter <em>dusti</em>.<em>A</em>, but also need</p><p>to optimize the parameter <em>dusti</em>.w, which is updated by Eq. <a href="#bookmark70">( 12)</a>.</p><p><em>dusti </em>.w = <em>dusti </em>.w − ξ1 ▽<em>dusti</em>.w<em>LT </em>(<em>dusti </em>.w, <em>dusti</em>.<em>A</em>),</p><p><a id="bookmark70"></a>(12)</p><p>where ξ1 is the learning rate, ▽<em>dusti</em>.w<em>LT </em>represents the archi- tecture gradient on training dataset.</p><p>Step 5: Perform the absorption operation. Some surround- ing dust particles with small mass value will be absorbed by their center dust. During this process, the size of dust population will change, the new size is determined by the absorptivity abs as Eq. <a href="#bookmark71">( 13)</a>.</p><p><a id="bookmark71"></a><em>n </em>= <em>n </em>* (1 − abs), (13)</p><p>where <em>n </em>is the size of the initial population, abs represents the absorptivity. In this work, we set abs as 0.5.</p><p><img src="/media/202408//1724856291.914452.png" />Step 6: Perform the explosion operation. During the pro- cess of Step 5, some dust particles with small mass value are absorbed by their center dust particles. To maintain the size of dust population, some new dust particles will be gen- erated around the center dust particles during this process. <a href="#bookmark60">This part is descripted in “Explosion operation and weights inheritance” in detail.</a></p><p>Once Step 6 finishes, go to Step 3.</p><p>According to the above detailed description of EGFA- NAS, the pseudo-code of EGFA-NAS is shown in Algorithm 1. Step 1 (lines 1–3) is the initialization. Step 2 (lines 4–5) is the operation of grouping. Step 3 (line 6) checks the ter- mination conditions. Step 4 (lines 7–12) is the process of movement and rotation. Step 5 (line 13) is the absorption operation. Step 6 (line 14) is the explosion operation.</p><table><tr><td></td></tr><tr><td><p>Algorithm 1: EGFA<s> </s>NAS</p></td></tr><tr><td><p>Input: the training data set Dr, the validation data set D, , the populationsize n , the number of group g , the absorptivity abs, the maximum radius Thax and minimum radius rmin for explosion strategy, the maximum and current number of epochs epoch , epoch cur=0 , dust population Dust <img src="/media/202408//1724856291.929507.png" /> , best dust particle best <img src="/media/202408//1724856291.937825.png" /> ·</p><p>output: center , best</p></td></tr><tr><td><p><img src="/media/202408//1724856292.04976.png" /></p><p>2. Dust e initialize the dustpopulationwith size of n randomly</p><p><img src="/media/202408//1724856292.0822191.png" /></p><p>4.Divide thedustpopulation into g grou ups</p><p><img src="/media/202408//1724856292.1216881.png" /></p><p>6. while termination conditions are not met do 7. for each individual dust (flag=0)do</p><p><img src="/media/202408//1724856292.185666.png" /></p><p>8. update dustiA by Eq. (9)-(11) 9. update dustiw by Eq. (12)</p><p>10. update dusti mass by Eq. (6) 11. end for</p><p><img src="/media/202408//1724856292.214438.png" /></p><p>12. update the center , best</p><p><img src="/media/202408//1724856292.2662.png" /></p><p><img src="/media/202408//1724856292.425337.png" /></p><p>15. update center , best 16. end while</p><p><a id="bookmark25"></a>17. return center , best</p></td></tr></table><p>i '</p><p>ΔA2</p><table><tr><td></td><td><p><img src="/media/202408//1724856292.4706602.png" /></p><p>ΔA1</p><p><img src="/media/202408//1724856292.51296.png" /></p><p><img src="/media/202408//1724856292.6065652.png" />></p><p><img src="/media/202408//1724856292.671936.png" /></p></td></tr><tr><td></td><td></td></tr></table><p><strong>Fig. 8 </strong>Changeofthe<em>i</em>thcell structure <em>A </em>duringtheprocess ofmovement <a id="bookmark68"></a>and rotation operation</p><p><strong>Experimental design</strong></p><p>The goal of EGFA-NAS is to search the optimal neural net- workarchitectureautomaticallywhich can achieve satisfying performance on a complex task, such as image classifica- tion. For this purpose, a series of experiments is designed to demonstrate the advantages of the proposed EGFA-NAS compared with the state-of-the-art NAS methods. First, we utilize the proposed EGFA-NAS to search neural network architectures in the benchmark search space: NAS-Bench- 201, and evaluate the performance of proposed EGFA-NAS by investigating the classification accuracy and compu- tational cost of the searched architecture on CIFAR-10, CIFAR-100, and ImageNet-16-120. Second, we investigate the consistency of relative evaluation with absolute evalua- tion, in terms of the accuracy and loss. Third, we investigate the effectiveness of the weight inheritance strategy. Finally, we examine the proposed EGFA-NAS in the larger and more practical search space: DARTS search space, and investigate the performance and universality of EGFA-NAS.</p><p>We first perform the proposed EGFA-NAS in the bench- mark search space: NAS-Bench-201. When the search pro- cess terminates, the absolute performance evaluation of the optimal architecture can be obtained directly by the NAS- Bench-201’s API with negligible computational cost. By utilizing NAS-Bench-201, we verify the consistency of rel- atively performance evaluation and absolute performance evaluation for the searched network architectures without retraining from scratch. In addition, we verify the effective- ness of weight inheritance in NAS-Bench-201 search space. But when the search process in DARTS search space termi- nates, the optimal network architecture needs to be retrained from scratch and be test on the test datasets. The test classifi- cation accuracy is reported as the results of our experiments. In the rest of this section, we introduce the peer com- petitors to compare with this proposed EGFA-NAS, the</p><p>benchmark datasets, and finally the parameter setting for the two typical search spaces: NAS-Bench-201 and DARTS search space.</p><p><strong>Peer competitors</strong></p><p><img src="/media/202408//1724856292.739166.png" />To demonstrate the advantage of the proposed EGFA-NAS, <a href="#bookmark72">a series of competitors are chosen for comparison. “Com- petitors of NAS-Bench-201” introduces the competitors </a>compared with the performance of the optimal architec- ture searched by EGFA-NAS in NAS-Bench-201 search space, and“<a href="#bookmark73">Competitors ofDARTSsearchspace</a>” introduces</p><p>the competitors compared with the performance of opti- mal architecture searched by EGFA-NAS in DARTS search <a id="bookmark72"></a>space.</p><p><strong>Competitors of NAS-Bench-201</strong></p><p>Due to thefactsthatNAS-Bench-201(onlyhasfive candidate operations)is a smaller search space, andthebest architecture has lower classification accuracy compared with the best one searched in other search space, the performance of optimal architecture searched byEGFA-NASinNAS-Bench-201 are only compared with the competitors which have reported the resultsin NAS-Bench-201 search space.</p><p>The selected competitors are mainly the efficient GD- basedNASmethods,includingDARTS-V1[<a href="#bookmark16">19</a>], DARTS-V2 <a href="#bookmark16">[ 19</a>], SETN [<a href="#bookmark74">50</a>], iDARTS [<a href="#bookmark75">51</a>], and GDAS [<a href="#bookmark38">20]</a>. The other three selected NAS competitors, namely ENAS [<a href="#bookmark15">18</a>], RSPS <a href="#bookmark18">[22</a>], and EvNAS [<a href="#bookmark76">52</a>], utilize RL, random search, and EA as <a id="bookmark73"></a>the search strategies for NAS tasks, respectively.</p><p><strong>Competitors of DARTS search space</strong></p><p>DARTS search space is a functional search space for NAS tasks, in which the optimal network architecture has promis- ingperformancecomparedwiththe state-of-the-artmanually designed CNN architectures. To compare the performance of the optimal network architecture searched byEGFA-NAS in the DARTS search space, we select four different kinds of competitors for comparison.</p><p>1. The first kind of competitors are the state-of-the-art CNN architectures, manually designed by domain experts, including ResNet-101 [<a href="#bookmark7">10</a>], DenseNet-BC [<a href="#bookmark8">11</a>], SENet</p><p><a href="#bookmark77">[53</a>], IGCV3 [<a href="#bookmark78">54</a>], ShuffleNet [<a href="#bookmark79">55</a>], VGG [<a href="#bookmark1">1</a>], and Wide ResNet [<a href="#bookmark80">56]</a>.</p><p>2. The second kind of competitors are the state-of-the- art EA-based NAS methods, including Hierarchical EA</p><p><a href="#bookmark12">[ 15</a>], AmoebaNet-A[<a href="#bookmark13">16</a>], LEMONADE[<a href="#bookmark20">24</a>],CGP-CNN</p><p><a href="#bookmark32">[25</a>], CNN-GA [<a href="#bookmark33">26</a>], AE-CNN [<a href="#bookmark34">32</a>], and AE-CNN + E2EPP[<a href="#bookmark35">33</a>],LargeEvo[<a href="#bookmark31">27</a>], GeNet[<a href="#bookmark30">31</a>], SI-EvoNet[<a href="#bookmark81">57</a>], NSGA-Net [<a href="#bookmark21">28</a>], and MOEA-PS [<a href="#bookmark82">58]</a>.</p><p><a id="bookmark83"></a>3. The third kind of competitors utilize RL to search for CNN architectures, such as NASNet-A [<a href="#bookmark14">17</a>], NASNet- A + CutOut [<a href="#bookmark14">17</a>], Proxyless NAS [<a href="#bookmark36">34</a>], BlockQNN [<a href="#bookmark37">35</a>], DPP-Net [<a href="#bookmark84">59</a>], MetaQNN [<a href="#bookmark85">60</a>], and ENAS [<a href="#bookmark15">18]</a>.</p><p>4. The fourth kind of competitors are mainly the GD- based NAS methods, such as DARTS-V1 + CutOut [<a href="#bookmark16">19</a>], DARTS-V2 + CutOut [<a href="#bookmark16">19</a>], RC-DARTS [<a href="#bookmark41">38</a>], and SNAS</p><p><a href="#bookmark17">[21</a>]. In addition, PNAS [<a href="#bookmark43">40</a>] is also selected for compar- ison, which use a sequential model-based optimization (SMBO) strategy.</p><p><strong>Benchmark datasets</strong></p><p>To investigate the performance of EGFA-NAS on NAS tasks, we test EGFA-NAS in two different search space, including NAS-Bench-201 and DARTS search space. All</p><p>experiments involve three benchmark datasets: CIFAR-10,</p><p>CIFAR-100 [<a href="#bookmark51">47</a>], and ImageNet-16-120 [<a href="#bookmark52">48</a>], which are widely adopted in experimental studies of state-of-the-art CNNs and NAS methods. In this work, each architecture searched in NAS-Bench-201 is trained and evaluated on CIFAR-10, CIFAR-100 [<a href="#bookmark51">47</a>], and ImageNet-16-120 [<a href="#bookmark52">48]</a>. Each architecture searched in DARTS search space is trained and evaluated on CIFAR-10, CIFAR-100. Each dataset is splitintothree subsets:trainingset, validation set,andtestset.</p><p>CIFAR-10: It is an imageclassification dataset consisting of 60K images with with classes. The original set contains 50K training images and 10K test images. Due to the need for a validation set, the original training set is randomly split into two subsets withequal size, each subset contains 25K images with ten classes. In this work, we regard one subset as the new training set and another as the validation set.</p><p>CIFAR-100: It has the same images as CIFAR-10, but it categorizes the images into 100 fine-grained classes. The CIFAR-100 original contains 50K images in the training set and 10K images in the test set. In this work, the original training set is randomly split into two subsets with equal size. One is regarded as the training set and another as the new validation set.</p><p>ImageNet-16-120:ImageNet is a large-scaleandwell-known dataset for imageclassification. Image-16-120 was built with 16 × 16 pixels from the down-sampling variant of ImageNet <a href="#bookmark86">[61</a>] (i.e., ImageNet 16 × 16). ImageNet-16-120 contains all images with labels ∈ [0, 119]. In sum, ImageNet-16-120 consists of 151.7K images for training, 3K images for vali- dation, and 3K images for testing with 120 classes.</p><p><a id="bookmark59"></a><strong>Parameter settings</strong></p><p>This section introduces the parameter setting for EGFA-NAS in detailed.</p><p><strong>Table 1 </strong>Hyperparameter settings of searching process</p><table><tr><td><p>Parameter</p></td><td><p>Value</p></td></tr><tr><td><p>Initial channels</p></td><td><p>16</p></td></tr><tr><td><p><em>B</em></p></td><td><p>5</p></td></tr><tr><td><p>Optimizer</p></td><td><p>SGD</p></td></tr><tr><td><p>Nesterov</p></td><td><p>1</p></td></tr><tr><td><p>Momentum</p></td><td><p>0.9</p></td></tr><tr><td><p>Batch size</p></td><td><p>256</p></td></tr><tr><td><p>LR scheduler</p></td><td><p>Cosine</p></td></tr><tr><td><p>Initial LR</p></td><td><p>2.5 × 10−2</p></td></tr><tr><td><p>min_LR</p></td><td><p>1 × 10−3</p></td></tr><tr><td><p>Weight decay</p></td><td><p>5 × 10−4</p></td></tr><tr><td><p>Random flip</p></td><td><p>0.5</p></td></tr></table><p><strong>Parameter settings for NAS-Bench-201</strong></p><p>For theNAS-Bench-201 search space, the parameter settings are onlyinvolvedinthesearch process,because NAS-Bench- 201 provides the absolute (final) performance evaluation for each architecture, and we can obtain the evaluation of the optimal architecture directly without retraining from scratch. We adopt the same skeleton network following [<a href="#bookmark19">23</a>] as Fig.<a href="#bookmark50">3.</a> Specifically, we set the number of initial channels for the first convolution layer as 16; set the number of cells in one normal block <em>B </em>as 5. During the search, almost parameter settings follows [<a href="#bookmark19">23</a>], as shown in Table <a href="#bookmark83">1.</a> Specifically, we train each architecture via Nesterov momentum SGD, using the cross- entropy loss as the loss function with batch size 256. We set the weightdecay as5×10−4 anddecaythelearningratefrom 2.5 × 10−2 to 1 × 10−3 with a cosine annealing scheduler.</p><p>In NAS-Bench-201 search space, we set up the same hyperparameters on three different datasets: CIFAR-10, CIFAR-100 [<a href="#bookmark51">47</a>], and ImageNet-16-120 [<a href="#bookmark52">48</a>], except for the part of data augmentation due to the slightly difference of images’ resolution. For CIFAR-10 and CIFAR-100, we use the random flip with probability of 0.5, the random crop 32 × 32 patch with 4 pixels padding, and the normalization over RGB channels. For ImageNet-16-120, we use the same strategies, exceptforrandom crop16 × 16patchwith2pixels padding.</p><p>The parameters listed in Table <a href="#bookmark83">1</a> are related to neural network architecture. As a population-based method,EGFA- NAShas its own parameters. Specifically, we set the number of groups <em>g </em>as 2, set the absorptivity abs as 0.5 for absorb operation, set the maximum radius <em>r</em>max as 0.1, and set the minimum radius <em>r</em>min as 0.001 for the explosion operation.</p><p>As a population-based NAS method, a larger number of epochs may lead to better performance, but the computa- tional cost will also increase. We investigate the impact of</p><p><strong>Table 2 </strong>Relative and absolute <a id="bookmark87"></a>performance (accuracy) of best architectures searched by</p><p>EGFA-NAS on CIFAR-10 with different number of epochs</p><p><strong>Table 3 </strong>Relative and absolute <a id="bookmark88"></a>performance (accuracy) of best architecture searched by</p><p>EGFA-NAS on CIFAR-10 with different population size</p><table><tr><td><p>Dataset</p></td><td><p>Number of epochs</p></td><td><p>Relative</p><p>performance</p></td><td><p>Absolute</p><p>performance</p></td><td><p>Search cost (GPU days)</p></td></tr><tr><td rowspan="5"><p>CIFAR-10</p></td><td><p>40</p></td><td><p>38.12</p></td><td><p>91.71</p></td><td><p>0.025</p></td></tr><tr><td><p>60</p></td><td><p>43.91</p></td><td><p>92.16</p></td><td><p>0.037</p></td></tr><tr><td><p>80</p></td><td><p>48.27</p></td><td><p>93.67</p></td><td><p>0.048</p></td></tr><tr><td><p>100</p></td><td><p>53.05</p></td><td><p>93.67</p></td><td><p>0.062</p></td></tr><tr><td><p>120</p></td><td><p>57.58</p></td><td><p>93.67</p></td><td><p>0.076</p></td></tr></table><table><tr><td><p>Dataset</p></td><td><p>Population size</p></td><td><p>Relative</p><p>performance</p></td><td><p>Absolute</p><p>performance</p></td><td><p>Search cost (GPU days)</p></td></tr><tr><td rowspan="5"><p>CIFAR-10</p></td><td><p>10</p></td><td><p>50.08</p></td><td><p>93.28</p></td><td><p>0.0481</p></td></tr><tr><td><p>15</p></td><td><p>49.00</p></td><td><p>93.36</p></td><td><p>0.0482</p></td></tr><tr><td><p>20</p></td><td><p>51.02</p></td><td><p>93.67</p></td><td><p>0.0482</p></td></tr><tr><td><p>25</p></td><td><p>48.83</p></td><td><p>93.67</p></td><td><p>0.0481</p></td></tr><tr><td><p>30</p></td><td><p>49.61</p></td><td><p>93.67</p></td><td><p>0.0482</p></td></tr></table><p>Note that all experimental settings are constrained by the computational resources available to us. All exper- iments are implemented via PyTorch 1.7 on one NVIDIA GeForce RTX 3090 GPU card. The computational cost is evaluated in terms of “GPU days”, calculated by multiplying the number of GPU cards by the search time in the days, following <a href="#bookmark16">[ 19,</a><a href="#bookmark38">20,</a><a href="#bookmark89">62]</a>.</p><p>the maximum number of epochs on the performance and computational cost on the CIFAR-10 dataset. The relative and absolute performance (accuracy) of the best architec- ture searched by EGFA-NAS on CIFAR-10 with different numbers of epochs are shown in Table <a href="#bookmark87">2.</a> The relative perfor- mance of the searched architectures is evaluated at the last epoch in the search phase without retraining. The absolute performance of the searched architecture is inquired by the API provided by NAS-Bench-201. From the results in Table <a href="#bookmark87">2</a>, we can observe that the best performance (93.67% accu- racy on CIFAR-10) is achieved when the number of epochs is set as 80. When the number of epochs is increased to 100, no improvement of the absolute performance is achieved, although the computational cost becomes more. Hence, we set the number of epochs as 80 in the experiments for NAS- Bench-201.</p><p>Generally, population size is a vital factor for the per- formance and efficiency of the population-based method, a larger population size usually leads to better performance, but also leads to an increment in search cost. But, in EGFA- NAS, we proposed a training strategy, which utilizes all dust individuals to complete the data training at each epoch. This training strategy reduce the sensitivity of performance to the population size, which can be verified by the results in Table <a href="#bookmark88">3.</a> Specifically, EGFA-NAS not only has a similar performance, but also has similar search cost (GPU days) with different population size. In addition, the architectures searched by EGFA-NAS achieve the best absolute perfor- mance when population sizes <em>n </em>≥ 20. In view of above</p><p>observation, we set the population size <em>n </em>as 20 in this work. In a word, absolute performance (accuracy) and search cost (GPU days) of EGFA-NAS are closely related to the max- imum number of epochs, but are not much related to the population size.</p><p><strong>Parameter settings for DARTS search space</strong></p><p>The neural cells for CNNs are searched in DARTS search space on CIFAR-10/100 following [<a href="#bookmark4">7,</a><a href="#bookmark14">17]</a>. The macro skele- ton ofDARTS search space isshown as Fig.<a href="#bookmark53">4</a>.Theparameter settingfor DARTS search space can bedividedintotwo parts:</p><p>(1) searching phase and (2) evaluation phase.</p><p>During searching phase, we set the number of initial chan- nels for the first convolutional layer as 16, set the number of cells in a normal block <em>B </em>as 2, set the number of epochs as 200. For training parameter w, we optimize each architec- ture via Nesterov momentum SGD with batch size of 256, set the initial learning rate as 2.5 × 10−2, and anneal it down to 1 × 10−3 with a cosine annealing scheduler. We set the momentum as 0.9 and decay weight as 5 × 10−4 . To opti- mize parameter <em>A</em>, we use the Adam optimizer with default settings.</p><p>During evaluation phase, we train the searched network by 600 epochs in total. We set the initial channels as 33, and set the number of cells in a normal block <em>B </em>as 6 or 8. We start the learning rate of 2.5 × 10−2 and reduce it to 0 with the cosine scheduler. We set the probability of path drop as 0.2 and the auxiliary tower with the weight of 0.4. Other</p><p><strong>Table 4 </strong>Hyperparameter settings for DARTS search space</p><table><tr><td><p>Parameter</p></td><td><p>Searching</p></td><td><p>Evaluation</p></td></tr><tr><td><p>Epochs</p></td><td><p>200</p></td><td><p>600</p></td></tr><tr><td><p>Initial channels</p></td><td><p>16</p></td><td><p>33</p></td></tr><tr><td><p><em>B</em></p></td><td><p>2</p></td><td><p>6/8</p></td></tr><tr><td><p>Optimizer</p></td><td><p>SGD/Adam</p></td><td><p>SGD</p></td></tr><tr><td><p>Batch size</p></td><td><p>256</p></td><td><p>256</p></td></tr><tr><td><p>Nesterov</p></td><td><p>1</p></td><td><p>1</p></td></tr><tr><td><p>Momentum</p></td><td><p>0.9</p></td><td><p>0.9</p></td></tr><tr><td><p>Scheduler</p></td><td><p>Cosine</p></td><td><p>Cosine</p></td></tr><tr><td><p>Initial LR</p></td><td><p>2.5 × 10−2</p></td><td><p>2.5 × 10−2</p></td></tr><tr><td><p>Min_LR</p></td><td><p>1 × 10−3</p></td><td><p>0</p></td></tr><tr><td><p>Decay weight</p></td><td><p>5 × 10−4</p></td><td><p>5 × 10−4</p></td></tr></table><p>parameter settings are set as same as in the searching phase (Table <a href="#bookmark90">4)</a>.</p><p><img src="/media/202408//1724856292.984596.png" />Compared with NAS-Bench-201 (<em>e </em>= 6, |<em>O</em>| = 5), DARTS search space (<em>e </em>= 14, |<em>O</em>| = 8) is a larger search space. Then we set the number of epochs as 200 to explore DARTS search space. The other parameters about EGFA- NAS, such as population size <em>n</em>, the number of groups <em>g</em>, the absorptivity abs, the maximum radius <em>r</em>max , and the maxi- <a href="#bookmark59">mum radius <em>r</em>min, are set as same as “Parameter settings for NAS-Bench-201” .</a></p><p><a id="bookmark9"></a><strong>Experimental results</strong></p><p><strong>Overall resultsin NAS-Bench-201 search space</strong></p><p>The experimental results of the optimal network discovered by EGFA-NAS and other competitors in NAS-Bench-201, in terms of classification accuracy and computational cost (GPU days), are presented in Table<a href="#bookmark91">5.</a> The symbol “–” means that the corresponding result was not reported. The results of iDARTS [<a href="#bookmark75">51</a>] and EvNAS [<a href="#bookmark76">52</a>] are sourced from the original published paper, and the consequences of the other competi- tors are extracted from [<a href="#bookmark19">23]</a>. The results highlighted in bold are the results of optimal best architectures and the results of the architectures searched by EGFA-NAS.</p><p>From the results in Table <a href="#bookmark91">5,</a> we can observe that EGFA- NAS can achieve better performance than the peer com- petitors: DARTS-V1 [<a href="#bookmark16">19</a>], DARTS-V2 [<a href="#bookmark16">19</a>], SETN [<a href="#bookmark74">50</a>], iDARTS [<a href="#bookmark75">51</a>], GDAS [<a href="#bookmark38">20</a>], ENAS [<a href="#bookmark15">18</a>], RSPS [<a href="#bookmark18">22</a>], and EvNAS [<a href="#bookmark76">52]</a>. Specifically, in the NAS-Bench-201 search space, EGFA-NAS discovers a network architecture with only 1.29M parameters, which consumes 0.048 GPU days andachieves93.67%accuracy on CIFAR-10. Forthe CIFAR- 100 dataset, EGFA-NAS achieves 71.29% accuracy with</p><p><a id="bookmark90"></a>1.23M parameters, and consumes 0.094 GPU day. For ImageNet-16-120, the architecture searched by EGFA-NAS obtains 42.33% accuracy with 1.32M parameters and 0.236 GPU days cost. Limited by the small search space: NAS- Bench-201, the performance of the network architecture searched is not comparable with the state-of-the-art designed CNN networks. But the performance of network architec- ture searched by EGFA-NAS has the smallest difference (0.7%worse on CIFAR-10,2.22%worse on CIFAR-100,and 4.95% worse on ImageNet-16–120) with the performance of the optimal theoretical architecture, compared with the other competitorsintheNAS-Bench-201search space. Inaddition, the proposed EGFA-NAS has the best efficiency compared with all selected peer competitors.</p><p>Note that the search cost (GPU Days) of the competi- tors listed in Table <a href="#bookmark91">5</a> is extracted from [<a href="#bookmark19">23]</a>. But reference <a href="#bookmark19">[23</a>] does not indicate to which dataset the result belongs. The number of parameters (Params) for the peer competi- tors is obtained by running the code provided by [<a href="#bookmark19">23</a>] on the CIFAR-10 dataset. The search cost (GPU Days) of EGFA- NAS is the computational consumption counted for the three datasets, respectively, on the computational platform with one NVIDIA GeForce RTX 3090 GPU card.</p><p><strong>Effectiveness of the relative performance evaluation</strong></p><p>Due to the fact that NAS-Bench-201 [<a href="#bookmark19">23</a>] provides the eval- uation information for each candidate architecture, in this section, we utilize the API provided by NAS-Bench-201 to obtain the absolute (final) performance evaluation (loss and accuracy) for the searched architectures without retrain-</p><p>ing, and verify the effectiveness of the evaluation strategy</p><p>adopted by EGFA-NAS. Figure <a href="#bookmark92">9</a> shows the comparison of relative performance evaluation with absolute perfor- mance evaluation, in terms of loss (Fig. <a href="#bookmark92">9</a>a) and accuracy (Fig.<a href="#bookmark92">9</a>b)on CIFAR-10. InFig.<a href="#bookmark92">9</a>,thelabel“rel”representsthe relative performance, andthelabel“abs”representstheabso- lute performance. The relative performance of the searched architectures is obtained on the validation dataset at the cur- rent epoch during the architecture search phase. From the results in Fig. <a href="#bookmark92">9</a>, we can observe that the relative perfor- mance of searched architectures cannot be comparable with their absolute performance, this is because the architectures searched during the search phase are not trained sufficiently. Figure <a href="#bookmark92">9</a>illustrates that the trend of the relative performance is consistent with the absolute performance of the searched architectures. In addition, we can observe that EGFA-NAS is only not stable enough for the first several epochs and can achieve architectures with stable performance when the number of epochs is larger than 30. The observation above verifies the effectiveness of the evaluation strategy adopted by EGFA-NAS.</p><p><strong>Table 5 </strong>Comparison of</p><p>EGFA-NAS with the peer</p><p><a id="bookmark91"></a>competitors in terms of the</p><p>classification accuracy (%) and the computational cost (GPU</p><p>days) on CIFAR-10, CIFAR-100, and ImageNet-16-120 datasets</p><table><tr><td><p>Method</p></td><td><p>Search strategy</p></td><td><p>GPU days</p></td><td><p>Params(M)</p></td><td><p>CIFAR-10</p></td><td><p>CIFAR-100</p></td><td><p>ImageNet-16-120</p></td></tr><tr><td><p>DARTS-V1 <a href="#bookmark16">[ 19]</a></p></td><td><p>GD</p></td><td><p>0.13</p></td><td><p>0.07a</p></td><td><p>54.30</p></td><td><p>15.61</p></td><td><p>16.32</p></td></tr><tr><td><p>DARTS-V2 <a href="#bookmark16">[ 19]</a></p></td><td><p>GD</p></td><td><p>0.41</p></td><td><p>0.07a</p></td><td><p>54.30</p></td><td><p>15.61</p></td><td><p>16.32</p></td></tr><tr><td><p>iDARTS <a href="#bookmark75">[51]</a></p></td><td><p>GD</p></td><td><p>–</p></td><td><p>–</p></td><td><p>93.58</p></td><td><p>70.83</p></td><td><p>40.89</p></td></tr><tr><td><p>SETN <a href="#bookmark74">[50]</a></p></td><td><p>GD</p></td><td><p>0.35</p></td><td><p>0.41a</p></td><td><p>86.19</p></td><td><p>56.87</p></td><td><p>31.90</p></td></tr><tr><td><p>GDAS <a href="#bookmark38">[20]</a></p></td><td><p>GD</p></td><td><p>0.33</p></td><td><p>1.2a</p></td><td><p>93.51</p></td><td><p>70.61</p></td><td><p>41.71</p></td></tr><tr><td><p>ENAS <a href="#bookmark15">[ 18]</a></p></td><td><p>RL</p></td><td><p>0.15</p></td><td><p>0.07a</p></td><td><p>54.30</p></td><td><p>15.61</p></td><td><p>16.32</p></td></tr><tr><td><p>RSPS <a href="#bookmark18">[22]</a></p></td><td><p>Random</p></td><td><p>0.10</p></td><td><p>0.43a</p></td><td><p>87.66</p></td><td><p>58.33</p></td><td><p>31.44</p></td></tr><tr><td><p>EvNAS <a href="#bookmark76">[52]</a></p></td><td><p>EA</p></td><td><p>0.26</p></td><td><p>–</p></td><td><p>92.18</p></td><td><p>66.74</p></td><td><p>39.00</p></td></tr><tr><td><p>Optimal</p></td><td><p> </p></td><td><p> </p></td><td><p> </p></td><td><p><strong>94.37</strong></p></td><td><p><strong>73.51</strong></p></td><td><p><strong>47.31</strong></p></td></tr><tr><td><p>EGFA-NAS</p></td><td><p><strong>EGFA</strong></p></td><td><p><strong>0.048</strong></p></td><td><p><strong>1.29</strong></p></td><td><p><strong>93.67</strong></p></td><td><p>–</p></td><td><p>–</p></td></tr><tr><td><p>EGFA-NAS</p></td><td><p><strong>EGFA</strong></p></td><td><p><strong>0.094</strong></p></td><td><p><strong>1.23</strong></p></td><td><p>–</p></td><td><p><strong>71.29</strong></p></td><td><p>–</p></td></tr><tr><td><p>EGFA-NAS</p></td><td><p><strong>EGFA</strong></p></td><td><p><strong>0.246</strong></p></td><td><p><strong>1.32</strong></p></td><td><p>–</p></td><td><p>–</p></td><td><p><strong>42.33</strong></p></td></tr></table><p>aCalculated by running the code publicly released by <a href="#bookmark19">[23]</a></p><p><img src="/media/202408//1724856293.044161.png" /></p><p><strong>Fig. 9 </strong>Comparison of relative evaluation and absolute evaluation of the architecture searched by EGFA-NAS</p><p><a id="bookmark92"></a><strong>Effectiveness of weight inheritance strategy</strong></p><p><img src="/media/202408//1724856293.1771169.png" />To improve the efficiency ofEGFA-NAS andreducethe com- putational cost, we propose a weight inheritance strategy <a href="#bookmark60">during the explosion operation as described in “Explosion operation and weights inheritance” . Specifically, the param- </a>eters w of new generated dust individuals are inherited from their centers. In this section, we attempt to verify the effec- tiveness of the weight inheritance strategy by replacing this proposed strategy with generating the parameter w randomly on CIFAR-10, and other settings are kept unchanged. To observe the difference between our proposed strategy and the way of generating parameter w randomly more clearly, we set the number of epochs as 300 in this experiment. The</p><p>estimated (relative) performance of searched network archi- tectures using weight inheritance and the way of generating the parameter w randomly are shown in Fig. <a href="#bookmark93">10</a>a and c, in terms of accuracy and loss, respectively. The final (absolute) performance of the network architectures searched by the two strategies is shown in Fig. <a href="#bookmark93">10</a>band d,in terms of accu- racy and loss, respectively. The results in Fig. <a href="#bookmark93">10</a> show a big difference between the estimated (relative) performance of the two strategies. Although the final (absolute) performance of the architectures searched by the two strategies is similar on CIFAR-10,EGFA-NAS using the proposed weight inher- itance can achieve the best network architecture earlier than utilizing the way of generating the parameter w randomly. In addition, the final performance of the architecture searched by inheritance weight is slightly better (93.67% accuracy)</p><p><img src="/media/202408//1724856293.2076328.png" /></p><p><strong>Fig. 10 </strong>Comparison of the performance of EGFA-NAS using weight inheritance strategy and by way of generation parameter w randomly on CIFAR-10</p><p>than utilizing the way of generating parameter w randomly (96.36% accuracy).</p><p><a id="bookmark93"></a><strong>Overall results in DARTS search space</strong></p><p>The experimental results of the optimal network discovered by EGFA-NAS in DARTS search space, in terms of clas- sification accuracy and computational cost (GPU days), are presented in Table <a href="#bookmark94">6.</a> The symbol “–” means that the corre- sponding results were not reported. The symbol “*” means that the results are extracted from [<a href="#bookmark16">19]</a>. The mode “a/b” in Table 5.4 means that “a” is the result for CIFAR-10 and “b”</p><p>is the result for CIFAR-100. The results of most competitors are extracted from the original published papers. <em>B </em>= 6 or 8 represents the number of normal cells in a normal block in theretraining phase. The results highlighted in bold are the result of the architectures searched by EGFA-NAS.</p><p>The results in Table <a href="#bookmark94">6</a> show that EGFA-NAS (<em>B </em>= 8) can achieve better performance than most state-of-the- art manual-designed CNN networks, including ResNet-101, ResNet + CutOut, SENet, IGCV3, ShuffleNet, VGG, and</p><p>WideResNet,but a little worse than DenseNet-BC(1.05%on CIFAR-100). The performance improvement of optimal net- work architecture searched byEGFA-NAS (<em>B </em>= 8) is 13.9% on CIFAR-100, and 3.89% on CIFAR-10, compared with VGG.</p><p>Compared with the 12 EA-based NAS methods, EGFA- NAS (<em>B </em>= 8) achieves better performance than Hierarchical EA, AmoebaNet-A, CGP-CNN, CNN-GA, AE-CNN, AE- CNN + E2EPP, LargeEvo, GeNet, SI-EvoNet, and MOEA- PS, but slightly worse than LEMONADE (0.19%) and NSGA-Net (0.02%) on CIFAR-10. EGFA-NAS (<em>B </em>= 8) achieves the best classification accuracy (81.85%) on the CIFAR-100, and consumes the least search cost (0.21 GPU days) than all selected EA-based NAS methods.</p><p>Compared with the six RL-based NAS methods, EGFA- NAS (<em>B </em>= 8) achieves better performance than NASNet-A, NASNet-A + CutOut, BlockQNN, DPP-Net, MetaQNN, and ENAS, but a little worse than Proxyless NAS (0.86%) on the CIFAR-10. The performance improvement of the opti- mal network architecture searched by EGFA-NAS (<em>B </em>= 8) is 4.15% on the CIFAR-10, and 8.99% on the CIFAR-100,</p><p><strong>Table 6 </strong>Comparison of</p><p>EGFA-NAS with the peer</p><p><a id="bookmark94"></a>competitors in terms of the</p><p>classification accuracy (%) and the computational cost (GPU</p><p>days) on CIFAR-10, CIFAR-100</p><table><tr><td><p>Method</p></td><td><p>Search strategy</p></td><td><p>GPU days</p></td><td><p>Params (M)</p></td><td><p>CIFAR-10</p></td><td><p>CIFAR-100</p></td></tr><tr><td><p>ResNet-101 <a href="#bookmark7">[ 10]</a></p></td><td><p>Manual</p></td><td><p>–</p></td><td><p>1.7</p></td><td><p>93.57</p></td><td><p>74.84</p></td></tr><tr><td><p>ResNet + CutOut <a href="#bookmark7">[ 10]</a></p></td><td><p>Manual</p></td><td><p> </p></td><td><p>1.7</p></td><td><p>95.39</p></td><td><p>77.90</p></td></tr><tr><td><p>DenseNet-BC <a href="#bookmark8">[ 11]</a></p></td><td><p>Manual</p></td><td><p>–</p></td><td><p>25.6</p></td><td><p>96.54</p></td><td><p>82.82</p></td></tr><tr><td><p>SENet <a href="#bookmark77">[53]</a></p></td><td><p>Manual</p></td><td><p>–</p></td><td><p>11.2</p></td><td><p>95.95</p></td><td><p>–</p></td></tr><tr><td><p>IGCV3 <a href="#bookmark78">[54]</a></p></td><td><p>Manual</p></td><td><p>–</p></td><td><p>2.2</p></td><td><p>94.96</p></td><td><p>77.95</p></td></tr><tr><td><p>ShuffleNet <a href="#bookmark79">[55]</a></p></td><td><p>Manual</p></td><td><p> </p></td><td><p>1.06</p></td><td><p>90.87</p></td><td><p>77.14</p></td></tr><tr><td><p>VGG <a href="#bookmark1">[ 1]</a></p></td><td><p>Manual</p></td><td><p>–</p></td><td><p>28.05</p></td><td><p>93.34</p></td><td><p>67.95</p></td></tr><tr><td><p>Wide ResNet <a href="#bookmark80">[56]</a></p></td><td><p>Manual</p></td><td><p>–</p></td><td><p>36.48</p></td><td><p>95.83</p></td><td><p>79.50</p></td></tr><tr><td><p>Hierarchical EA <a href="#bookmark12">[15]</a></p></td><td><p>EA</p></td><td><p>300</p></td><td><p>61.3</p></td><td><p>96.37</p></td><td><p>–</p></td></tr><tr><td><p>AmoebaNet-A <a href="#bookmark13">[ 16]</a></p></td><td><p>EA</p></td><td><p>3150</p></td><td><p>3.2</p></td><td><p>96.66</p></td><td><p>81.07</p></td></tr><tr><td><p>LEMONADE <a href="#bookmark20">[24]</a></p></td><td><p>EA</p></td><td><p>90</p></td><td><p>13.1</p></td><td><p>97.42</p></td><td><p>–</p></td></tr><tr><td><p>CGP-CNN <a href="#bookmark32">[25]</a></p></td><td><p>EA</p></td><td><p>27</p></td><td><p>1.7</p></td><td><p>94.02</p></td><td><p>–</p></td></tr><tr><td><p>CNN-GA <a href="#bookmark33">[26]</a></p></td><td><p>EA</p></td><td><p>35/40</p></td><td><p>2.9/4.1</p></td><td><p>96.78</p></td><td><p>79.47</p></td></tr><tr><td><p>AE-CNN <a href="#bookmark34">[32]</a></p></td><td><p>EA</p></td><td><p>27/36</p></td><td><p>2.0/5.4</p></td><td><p>95.3</p></td><td><p>77.6</p></td></tr><tr><td><p>AE-CNN + E2EPP <a href="#bookmark35">[33]</a></p></td><td><p>EA</p></td><td><p>7/10</p></td><td><p>4.3/20.9</p></td><td><p>94.7</p></td><td><p>77.98</p></td></tr><tr><td><p>LargeEvo <a href="#bookmark31">[27]</a></p></td><td><p>EA</p></td><td><p>2750/2750</p></td><td><p>5.4/40.4</p></td><td><p>94.6</p></td><td><p>77.00</p></td></tr><tr><td><p>GeNet <a href="#bookmark30">[31]</a></p></td><td><p>EA</p></td><td><p>–</p></td><td><p>–</p></td><td><p>94.61</p></td><td><p>74.88</p></td></tr><tr><td><p>SI-EvoNet <a href="#bookmark81">[57]</a></p></td><td><p>EA</p></td><td><p>0.46/0.81</p></td><td><p>0.51/0.99</p></td><td><p>96.02</p></td><td><p>79.16</p></td></tr><tr><td><p>NSGA-Net <a href="#bookmark21">[28]</a></p></td><td><p>EA</p></td><td><p>4/8</p></td><td><p>3.3/3.3</p></td><td><p>97.25</p></td><td><p>79.26</p></td></tr><tr><td><p>MOEA-PS <a href="#bookmark82">[58]</a></p></td><td><p>EA</p></td><td><p>2.6/5.2</p></td><td><p>3.0/5.8</p></td><td><p>97.23</p></td><td><p>81.03</p></td></tr><tr><td><p>NASNet-A <a href="#bookmark14">[ 17]</a></p></td><td><p>RL</p></td><td><p>2000</p></td><td><p>3.3</p></td><td><p>96.59</p></td><td><p> </p></td></tr><tr><td><p>NASNet-A + CutOut <a href="#bookmark14">[ 17]</a></p></td><td><p>RL</p></td><td><p>2000</p></td><td><p>3.1</p></td><td><p>97.17</p></td><td><p>–</p></td></tr><tr><td><p>Proxyless NAS <a href="#bookmark36">[34]</a></p></td><td><p>RL</p></td><td><p>1500</p></td><td><p>5.7</p></td><td><p>97.92</p></td><td><p>–</p></td></tr><tr><td><p>BlockQNN <a href="#bookmark37">[35]</a></p></td><td><p>RL</p></td><td><p>96</p></td><td><p>39.8</p></td><td><p>96.46</p></td><td><p>–</p></td></tr><tr><td><p>DPP-Net <a href="#bookmark84">[59]</a></p></td><td><p>RL</p></td><td><p>8</p></td><td><p>0.45</p></td><td><p>94.16</p></td><td></td></tr><tr><td><p>MetaQNN <a href="#bookmark85">[60]</a></p></td><td><p>RL</p></td><td><p>90</p></td><td><p>11.2</p></td><td><p>93.08</p></td><td><p>72.86</p></td></tr><tr><td><p>ENAS <a href="#bookmark15">[ 18]</a></p></td><td><p>RL</p></td><td><p>0.5</p></td><td><p>4.6</p></td><td><p>97.06</p></td><td><p>–</p></td></tr><tr><td><p>ENAS <a href="#bookmark15">[ 18</a>]*</p></td><td><p>RL</p></td><td><p>4</p></td><td><p>4.2</p></td><td><p>97.09</p></td><td><p>–</p></td></tr><tr><td><p>DARTS-V1 + CutOut <a href="#bookmark16">[ 19]</a></p></td><td><p>GD</p></td><td><p>1.5</p></td><td><p>3.3</p></td><td><p>97.00</p></td><td></td></tr><tr><td><p>DARTS-V2 + CutOut <a href="#bookmark16">[ 19]</a></p></td><td><p>GD</p></td><td><p>4</p></td><td><p>3.4</p></td><td><p>97.18</p></td><td><p>82.46</p></td></tr><tr><td><p>RC-DARTS <a href="#bookmark41">[38]</a></p></td><td><p>GD</p></td><td><p>1</p></td><td><p>0.43</p></td><td><p>95.83</p></td><td></td></tr><tr><td><p>SNAS <a href="#bookmark17">[21]</a></p></td><td><p>GD</p></td><td><p>1.5</p></td><td><p>2.8</p></td><td><p>97.15</p></td><td><p>–</p></td></tr><tr><td><p>PNAS <a href="#bookmark43">[40]</a></p></td><td><p>SMBO</p></td><td><p>225</p></td><td><p>3.2</p></td><td><p>96.37</p></td><td><p>80.47</p></td></tr><tr><td><p>EGFA-NAS (<em>B </em>= 6)</p></td><td><p><strong>EGFA</strong></p></td><td><p><strong>0.21/0.4</strong></p></td><td><p><strong>2.56/2.15</strong></p></td><td><p><strong>96.57</strong></p></td><td><p><strong>80.08</strong></p></td></tr><tr><td><p>EGFA-NAS (<em>B </em>= 8)</p></td><td><p><strong>EGFA</strong></p></td><td><p><strong>0.21/0.4</strong></p></td><td><p><strong>3.47/2.88</strong></p></td><td><p><strong>97.23</strong></p></td><td><p><strong>81.85</strong></p></td></tr></table><p>*Extracted from the reference <a href="#bookmark16">[ 19]</a></p><p>compared with MetaQNN. The proposed EGFA-NAS (<em>B </em>= 8) has the best efficiency and consumes the least GPU days even compared with the ENAS, which only consumes 0.5 GPU days on the CIFAR-10 in the published paper.</p><p>Compared with four GD-based NAS methods and PNAS, EGFA-NAS (<em>B </em>= 8) achieves better performance than DARTS-V1 + CutOut, RC-DARTS, and SNAS, but a little worse than DARTS-V2 + CutOut (0.61%) on the CIFAR- 100. Although GD-based NAS methods usually have better efficiency than EA-based and RL-based methods, our pro-</p><p>posed EGFA-NAS (<em>B </em>= 8) has the best efficiency compared to all selected GD-based NAS methods.</p><p>In addition, EGFA-NAS can obtain better final learning accuracy when setting larger number of cells in a normal block during theretraining phase, but will lead to larger num- ber of parameters. The overall results in Table <a href="#bookmark94">6</a> show that this proposed EGFA-NAS not only has competitive learning accuracy but also has the best efficiency compared with the</p><p><a id="bookmark26"></a>four kinds of competitors.</p><p><strong>Conclusion</strong></p><p>This paper proposes an efficient population-based NAS method based on the EGFA, called EGFA-NAS, which can achieve an optimal neural architecture with competitive learning accuracy but consumes a little computational cost. Specifically, EGFA-NAS relaxes the discrete search space to a continuous one and then utilizes EGFA and gradient descenttooptimizetheweights ofthecandidate architectures in conjunction. The proposed training and weight inheri- tance strategies for EGFA-NAS reduce the computational cost dramatically. The experimental results in two typical micro search spaces: NAS-Bench-201 and DARTS, demon- strate that EGFA-NAS is able to match or outperform the state-of-the-art NAS methods on image classification tasks with remarkable efficiency improvement. Specifically, to <a id="bookmark2"></a>searchthe CIFAR-10onthecomputationalplatformwith one NVIDIA GeForce RTX 3090 GPU card,EGFA-NAS obtains <a id="bookmark3"></a>the optimal neural architectures in NAS-Bench-201 search space with 93.67% accuracy but only consumes 0.048 GPU days, discovers the optimal neural architectures in DARTS search space with 97.23% accuracy and a cost of 0.21 GPU day.</p><p>Although EGFA-NAS is promising for designing high- performance neural networks automatically, it still has one limitation. Similar to the other NAS methods using the low- fidelity evaluation strategy, the relative evaluation adopted in EGFA-NAS during the search phase may lead to miss- ing some promising architectures. In future work, we will attempt to design a better evaluation strategy with better rank</p><p>consistency for lightweight NAS.</p><p><strong>Acknowledgements </strong>This work was supported by the National Natural Science Foundation of China (no. 62072212), the Development Project ofJilinProvinceofChina(no.20220508125RC,20230201065GX), and the Jilin Provincial Key Laboratory of Big Data Intelligent Cognition (no. 20210504003GH).</p><p><strong>Data availability </strong>Data will be made available on request. <strong>Declarations</strong></p><p><strong>Conflict of interest </strong>On behalf of all authors, the corresponding author states that there is no conflict of interest.</p><p><strong>Open Access </strong>This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing,adap- tation, distribution and reproduction in any medium or format, as <a id="bookmark14"></a>long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indi- cate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, <a id="bookmark15"></a>unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the</p><p><img src="/media/202408//1724856293.5380611.png" />permitteduse, youwillneedtoobtainpermissiondirectlyfromthe copy- <a href="http://creativecommons.org/licenses/by/4.0/">right holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.</a></p><p><a id="bookmark1"></a><strong>References</strong></p><p><img src="/media/202408//1724856293.5474.png" />1. Simonyan K, Zisserman A (2015) Very Deep Convolutional Net- <a href="https://arxiv.org/abs/1409.1556">works for Large-Scale Image Recognition. arXiv preprint, arXiv: 1409.1556</a></p><p>2. Huang G, Sun Y, Liu Z etal (2016) Deep networks with stochastic depth. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision—ECCV 2016. Springer International Publishing, Cham, pp 646–661</p><p>3. CiresanD, MeierU, SchmidhuberJ(2012)Multi-columndeep neu- ral networks for image classification. In: Proceedings of the IEEE international conference on computer vision. CVPR, Providence, pp 3642–3649</p><p>4. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classifi- cation with deep convolutional neural networks. Commun ACM 60:84–90. <a href="https://doi.org/10.1145/3065386">https://doi.org/10.1145/3065386</a></p><p>5. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE interna- tional conference on computer vision. ICCV, pp 1440–1448</p><p>6. Zhao Z,Zheng P, XuS (2019) Object detection with deep learning: a review. IEEE Trans Neural Netw Learning Syst 30:3212–3232.</p><p><a id="bookmark4"></a><a href="https://doi.org/10.1109/TNNLS.2018.2876865">https://doi.org/10.1109/TNNLS.2018.2876865</a></p><p>7. Zoph B, Le QV (2017) Neural Architecture Search with Reinforce- <a id="bookmark5"></a>ment Learning. arXiv preprint, <a href="https://arxiv.org/abs/1611.01578">arXiv:1611.01578</a></p><p><img src="/media/202408//1724856293.594785.png" />8. Hesamian MH, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image segmentation: achievements and <a href="https://doi.org/10.1007/s10278-019-00227-x">challenges. J Digit Imaging 32:582–596. </a><a href="https://doi.org/10.1007/">https://doi.org/10.1007/</a><a href="https://doi.org/10.1007/s10278-019-00227-x"> </a><a id="bookmark6"></a><a href="https://doi.org/10.1007/s10278-019-00227-x">s10278-019-00227-x</a></p><p>9. Ghosh S, Das N, Das I, Maulik U (2020) Understanding deep learning techniques for image segmentation. ACM Comput Surv <a id="bookmark7"></a>52:1–35. <a href="https://doi.org/10.1145/3329784">https://doi.org/10.1145/3329784</a></p><p>10. He K,ZhangX, RenS,etal(2016)Deepresiduallearningforimage recognition. In: Proceedings of the IEEE conference on computer <a id="bookmark8"></a>vision and pattern recognition. CVPR, pp 770–778</p><p>11. Huang G, Liu Z, VanDer Maaten L et al (2017) Densely connected convolutionalnetworks. In:ProceedingsoftheIEEE conference on computer vision and pattern recognition. pp 4700–4708</p><p><a id="bookmark10"></a>12. Praczyk T (2016) Cooperative co-evolutionary neural networks. IFS 30:2843–2858. <a href="https://doi.org/10.3233/IFS-162095">https://doi.org/10.3233/IFS-162095</a></p><p><img src="/media/202408//1724856293.711373.png" />13. Garcia-Pedrajas N, Hervas-Martinez C, Munoz-Perez J (2003) COVNET: a cooperative coevolutionary model for evolving artifi- <a href="https://doi.org/10.1109/TNN.2003.810618">cialneuralnetworks. IEEE Trans Neural Netw14:575–596. https:// </a><a id="bookmark11"></a><a href="https://doi.org/10.1109/TNN.2003.810618">doi.org/10.1109/TNN.2003.810618</a></p><p>14. Yao X (1999) Evolving artificial neural networks. Proc IEEE <a id="bookmark12"></a>87:1423–1447. <a href="https://doi.org/10.1109/5.784219">https://doi.org/10.1109/5.784219</a></p><p><img src="/media/202408//1724856293.731762.png" />15. Liu H, Simonyan K, Vinyals O, et al (2018) Hierarchical Repre- <a href="https://arxiv.org/abs/1711.00436">sentations for Efficient Architecture Search. arXiv preprint, arXiv: 1711.00436</a></p><p>16. Real E, Aggarwal A, Huang Y, Le QV (2019) Regularized evolu- <a id="bookmark13"></a>tion for image classifier architecture search. AAAI 33:4780–4789.</p><p><a href="https://doi.org/10.1609/aaai.v33i01.33014780">https://doi.org/10.1609/aaai.v33i01.33014780</a></p><p>17. Zoph B, Vasudevan V, Shlens J, Le QV (2018) Learning transfer- able architectures for scalable image recognition. In: Proceedings oftheIEEE conference on computervisionandpatternrecognition. CVPR, pp 8697–8710</p><p>18. Pham H, Guan M,Zoph B, et al (2018) Efficient neural architecture search via parameters sharing. In: Proceedings of the 35th Interna- tional Conference on Machine Learning. PMLR, pp 4095–4104</p><p>19. Liu H, Simonyan K, Yang Y (2019) DARTS: Differentiable Archi- <a id="bookmark16"></a>tecture Search. arXiv preprint, <a href="https://arxiv.org/abs/1806.09055">arXiv:1806.09055</a></p><p>20. Dong X, Yang Y (2019) Searching for a robust neural architec- <a id="bookmark38"></a><a id="bookmark43"></a>ture in four gpu hours. In: Proceedings of the IEEE international <a id="bookmark17"></a>conference on computer vision. CVPR, pp 1761–1770</p><p>21. Xie S, Zheng H, Liu C, Lin L (2020) SNAS: Stochastic Neural Architecture Search. arXiv preprint, <a href="https://arxiv.org/abs/1812.09926">arXiv:1812.09926</a></p><p>22. Li L, Talwalkar A (2020) Random search and reproducibility for <a id="bookmark18"></a><a id="bookmark19"></a>neural architecture search. In: Adams RP, Gogate V (eds) Proceed- ings of the 35th uncertainty in artificial intelligence conference. PMLR, pp 367–377</p><p><img src="/media/202408//1724856293.812905.png" />23. Dong X, Yang Y (2020) NAS-Bench-201: Extending the Scope of <a href="https://arxiv.org/abs/2001.00326">Reproducible Neural Architecture Search. arXiv preprint, arXiv: 2001.00326</a></p><p><a id="bookmark20"></a>24. Elsken T, Metzen JH, Hutter F (2019) Efficient Multi-objective Neural Architecture Search via Lamarckian Evolution. arXiv <a id="bookmark32"></a>preprint, <a href="https://arxiv.org/abs/1804.09081">arXiv:1804.09081</a></p><p>25. Suganuma M, Shirakawa S, Nagao T (2017) A genetic pro- gramming approach to designing convolutional neural network <a id="bookmark46"></a>architectures. In: Proceedings of the genetic and evolutionary com- <a id="bookmark33"></a>putation conference. ACM, Berlin, pp 497–504</p><p><img src="/media/202408//1724856293.834065.png" />26. SunY, Xue B, Zhang Metal (2020) Automatically designing CNN architectures using the genetic algorithm for image classification. IEEETrans Cybern50:3840–3854. <a href="https://doi.org/10.1109/TCYB">https://doi.org/10.1109/TCYB</a>. 2020.2983860</p><p><a id="bookmark31"></a>27. Real E, Moore S, Selle A et al (2017) Large-scale evolution of <a id="bookmark51"></a>image classifiers. In: Proceedings of the 34th international confer- <a id="bookmark21"></a>ence on machine learning. PMLR, pp 2902–2911</p><p>28. Lu Z, Whalen I, Boddeti V, et al (2019) NSGA-Net: neural architecture search using multi-objective genetic algorithm. In: Proceedings of the genetic and evolutionary computation confer- ence. ACM, Prague, pp 419–427</p><p>29. Hu X, Huang L, Wang Y, Pang W (2019) Explosion gravitation <a id="bookmark22"></a>field algorithm with dust sampling for unconstrained optimization. Appl Soft Comput81:105500. <a href="https://doi.org/10.1016/j.asoc.2019">https://doi.org/10.1016/j.asoc.2019</a>. 105500</p><p>30. Gould S, Fernando B, Cherian A, et al (2016) On differentiating <a id="bookmark29"></a><a id="bookmark75"></a>parameterized argmin and argmax problems with application to <a id="bookmark30"></a>bi-level optimization. arXiv:1607.05447</p><p>31. Xie L, Yuille A (2017) Genetic CNN. In: Proceedings of the IEEE <a id="bookmark34"></a><a id="bookmark76"></a>international conference on computer vision. ICCV, pp 1379–1388</p><p>32. Sun Y, Xue B, Zhang M, Yen GG (2020) Completely automated CNN architecturedesignbased on blocks. IEEE Trans Neural Netw Learn Syst 31:1242–1254. <a href="https://doi.org/10.1109/TNNLS.2019">https://doi.org/10.1109/TNNLS.2019</a>. 2919608</p><p><a id="bookmark35"></a>33. SunY, Wang H, Xue B et al (2020) Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based perfor- mance predictor. IEEE Trans Evol Comput 24:350–364. https:// <a id="bookmark36"></a>doi.org/10.1109/TEVC.2019.2924461</p><p>34. Cai H, Zhu L, Han S (2019) ProxylessNAS: Direct Neural Archi- tectureSearch on Target Task and Hardware. arXiv preprint, arXiv: 1812.00332</p><p>35. Zhong Z, Yang Z, Deng B et al (2021) BlockQNN: efficient block- <a id="bookmark37"></a>wise neural network architecture generation. IEEE Trans Pattern <a id="bookmark39"></a>Anal Mach Intell 43:2314–2328. <a href="https://doi.org/10.1109/TPAMI">https://doi.org/10.1109/TPAMI</a>. 2020.2969193</p><p>36. Chu X, Wang X, ZhangB,etal(2021)DARTS-:RobustlyStepping out of Performance Collapse Without Indicators. arXiv preprint,</p><p><a href="https://arxiv.org/abs/2009.01027">arXiv:2009.01027</a></p><p>37. Liang H, Zhang S, Sun J, et al (2020) DARTS+: Improved Differ- entiable Architecture Search with Early Stopping. arXiv preprint,</p><p><a href="https://arxiv.org/abs/1909.06035">arXiv:1909.06035</a></p><p><img src="/media/202408//1724856293.938485.png" />38. Jin X, Wang J, Slocum J, et al (2019) RC-DARTS: Resource Con- <a href="https://arxiv.org/abs/1912.12814">strained Differentiable Architecture Search. arXiv preprint, arXiv: 1912.12814</a></p><p>39. Ye P, LiB, Li Y, et al (2022) β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search. In:ProceedingsoftheIEEE <a id="bookmark42"></a>conference on computer vision and pattern recognition. CVPR,</p><p><img src="/media/202408//1724856293.955188.png" />New Orleans, LA, USA, pp 10864–10873. <a href="https://doi.org/10.1109/">https://doi.org/10.1109/</a> CVPR52688.2022.01060</p><p>40. Liu C, Zoph B, Neumann M et al (2018) Progressive neural archi- tecture search. In: Proceedings of the European conference on computer vision. ECCV, pp 19–34</p><p>41. ZhengM,LiuG,ZhouCet al(2010)Gravitationfieldalgorithmand its application in gene cluster. Algorithms Mol Biol 5:32. https:// doi.org/10.1186/1748-7188-5-32</p><p>42. Zheng M, Sun Y, Liu G et al (2012) Improved gravitation field algorithm and its application in hierarchical clustering. PLoS One <a id="bookmark44"></a>7:e49039. <a href="https://doi.org/10.1371/journal.pone.0049039">https://doi.org/10.1371/journal.pone.0049039</a></p><p><img src="/media/202408//1724856293.985219.png" />43. Zheng M, Wu J, Huang Y et al (2012) Inferring gene regulatory networks by singular value decomposition and gravitation field <a href="https://doi.org/10.1371/journal.pone.0051141">algorithm. PLoS One 7:e51141. </a><a href="https://doi.org/10.1371/journal">https://doi.org/10.1371/journal</a><a href="https://doi.org/10.1371/journal.pone.0051141">. pone.0051141</a></p><p>44. Safronov VS (1972) Evolution of the protoplanetary cloud and <a id="bookmark45"></a>formation of the earth and the planets. Israel Program for Scientific Translations, Jerusalem</p><p>45. Huang L, Hu X, Wang Y, Fu Y (2022) EGFAFS: a novel feature selection algorithm based on explosion gravitation field algorithm. Entropy 24:873. <a href="https://doi.org/10.3390/e24070873">https://doi.org/10.3390/e24070873</a></p><p><a id="bookmark48"></a>46. Real E, Moore S, Selle A, et al (2017) Large-scale evolution of imageclassifiers. In:International conference on machinelearning. PMLR, pp 2902–2911</p><p>47. Krizhevsky A, Hinton G (2009) Learning multiple layers of fea- <a id="bookmark52"></a>tures from tiny images. 7.</p><p>48. ChrabaszczP, Loshchilov I, Hutter F (2017) A Downsampled Vari- ant of ImageNet as an Alternative to the CIFAR datasets. arXiv <a id="bookmark64"></a>preprint, <a href="https://arxiv.org/abs/1707.08819">arXiv:1707.08819</a></p><p>49. Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Adv Neural Inf Process Syst. 31.</p><p><a id="bookmark74"></a>50. Dong X, Yang Y (2019) One-shot neural architecture search via self-evaluated template network. In: Proceedings of the IEEE inter- national conference on computer vision. ICCV, pp 3681–3690</p><p>51. Zhang M, Su SW, Shirui P et al (2021) iDARTS: Differentiable architecture search with stochastic implicit gradients. In: Proceed- ings of the 38th international conference on machine learning. PMLR, pp 12557–12566</p><p>52. Sinha N, Chen K-W (2021) Evolving neural architecture using one shot model. In: Proceedings of the genetic and evolutionary computation conference. ACM, Lille France, pp 910–918</p><p><a id="bookmark77"></a>53. Jie H, Li S, Gang S (2018) Squeeze-and-excitation networks. In: ProceedingsoftheIEEE conference on computervisionandpattern <a id="bookmark78"></a>recognition. CVPR, pp 7132–7141</p><p>54. Sun K, Li M, Liu D, Wang J (2018) IGCV3: Interleaved Low- Rank Group Convolutions for Efficient Deep Neural Networks. arXiv preprint, <a href="https://arxiv.org/abs/1806.00178">arXiv:1806.00178</a></p><p>55. Zhang X, Zhou X, Lin M, Sun J (2018) ShuffleNet: an extremely <a id="bookmark79"></a>efficient convolutional neural network for mobile devices. In: Pro- ceedings of the IEEE conference on computer vision and pattern <a id="bookmark80"></a>recognition. CVPR, pp 6848–6856</p><p>56. ZagoruykoS, KomodakisN(2017)WideResidualNetworks. arXiv <a id="bookmark81"></a>preprint, <a href="https://arxiv.org/abs/1605.07146">arXiv:1605.07146</a></p><p><img src="/media/202408//1724856294.01585.png" />57. Zhang H, Jin Y, Cheng R, Hao K (2021) Efficient evolutionary searchofattention convolutionalnetworks viasampledtraining and node inheritance. IEEE Trans Evol Comput 25:371–385. https:// <a id="bookmark40"></a><a id="bookmark82"></a>doi.org/10.1109/TEVC.2020.3040272</p><p>58. Xue Y, Chen C, Słowik A (2023) Neural architecture search based on a multi-objective evolutionary algorithm with probability stack. <a id="bookmark41"></a>IEEE Trans Evol Comput 27:778–786. <a href="https://doi.org/10.1109/">https://doi.org/10.1109/</a> TEVC.2023.3252612</p><p>59. Dong J, Cheng AC, Juan D, et al (2018) DPP-Net: device-aware <a id="bookmark84"></a>progressive search for pareto-optimal neural architectures. In: Pro- ceedings of the European conference on computer vision. ECCV, pp 517–531</p><p>60. Baker B, Gupta O, NaikN, RaskarR (2017) Designing Neural Net- work Architectures using Reinforcement Learning. arXiv preprint,</p><p><a href="https://arxiv.org/abs/1611.02167">arXiv:1611.02167</a></p><p><a id="bookmark85"></a><a id="bookmark86"></a>61. Deng J, Dong W, Socher R, et al (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of the IEEE interna- tional conference on computer vision. CVPR, Miami, pp 248–255</p><p><a id="bookmark89"></a>62. Fan L, Wang H (2022) Surrogate-assisted evolutionary neural architecture search with network embedding. Complex Intell Syst.</p><p><a href="https://doi.org/10.1007/s40747-022-00929-w">https://doi.org/10.1007/s40747-022-00929-w</a></p><p><strong>Publisher’s Note </strong>Springer Nature remains neutral with regard to juris- dictional claims in published maps and institutional affiliations.</p>
刘世财
2024年8月28日 22:44
转发文档
收藏文档
上一篇
下一篇
手机扫码
复制链接
手机扫一扫转发分享
复制链接
Markdown文件
HTML文件
PDF文档(打印)
分享
链接
类型
密码
更新密码