Curriculum Vitae
Education
- B.S. in Communication Engineering, UESTC, 2014-2018
- M.S. in Electrical and Computer Engineering, Georgia Institute of Technology, 2019- Dec 2020
Internship experience
KPMG
• Implemented a database system for the data, cleaned, manipulated large datasets from different servers.
• Initiated mechanisms to optimize the database and queries (index optimization, data table optimization, etc.), reducing completion time of multiple tasks by 20%.
• Applied scilit-learn to predict the growing of the number of customers, collaborated with teammates to develop reports and dashboards using Tableau.Internship in Chengdu Zhunxingyunxue Technology
• Helped the company to build an image labelling system and data management platform to improve the work efficiency of the image labeling team, collaborated with colleagues to program and debug.
• Inserted a real-time detecting module to physical layer of the network system to report the problem of single route link and implemented recovery module to the virtual network platform.
• Achieved functionalities including data management and image labelling for iterative development by implementing REST API based on Spring Framework.
• Optimized the infrastructure which allows team members to SSH into the development environment, rebuilt SSH bastion of the lab for connecting to Kubernetes nodes in different areas.
• Improved the previous batch process with Kafka, which has the function of data model synchronization between teams, reducing teams’ waiting time by 30 to 60 minutes.
• Designed a model for image recognition using optimized ResNet with Online Hard Example Mining and Multi-GPU training with parameter server architecture.
• The optimized model increased accuracy and reached faster training speed comparing to the built-in ResNet in PyTorch.Internship in Gelan Technology
• Built up a positioning software, used RSSI model for distance calculation, the error of positioning is less than 0.5m.
• Applied Kalman filter to improve the algorithm, collaborated with team members to improve the signal processing stability of the Physical Layer of the Wi-Fi system.
Projects
Android Microblog APP
• Built an Android microblog app which supports translation function and used Adobe XD to design the wireframe for the system, used Apache’s A/B tool to test the system, achieved an average of 4706 requests per second.
• Did UI design for the system, supported the change of theme mode, implemented XSS Defence for the system, combined Redis and MySQL to be the database of the system.
• Developed a neural network machine translation system based on neural network architecture, preprocessed training data using Tokenization and Normalization, improved translation performance through Fine-tuning and Back translation.Book Recommendation System
• Realized a book recommendation system based on Apache Flink, collaborated with other members in the team to design the entire system, applied Sketch to do the UI design, used React framework to design the Front-end of the System.
• Used Apache Lucene to do the frequency calculation which can support the recommendation algorithms.
• Developed a scalable ETL pipeline to process 132GB data based on Hadoop and HBase.- Machine Learning Internship at Future Media Lab (Advisor: Professor Hengtao Sheng)
- Owned end-to-end machine learning pipelines to detect fraudulent activities on Remitly’s platform, reducing fraud sideline rate from 3% to 1.5% when compared to rule-based systems, while maintaining industry lead fraud loss rate (LightGBM,Random Forest, XGBoost).
- Solved the obscurity of model decision imposed on manual review by building model explanation engines using Shapley Additive explanations that explain individual predictions of black-box models. Collaborated with software engineers to deploy the explanation engine in production. Worked with operation team to create standard operation procedures and provide coaching on the explanation UI (SHAP, LIME, KNN).
- Programmatically generated labels for millions of training data using weak supervision, addressing the bottleneck of lacking ground truth labels (Snorkel).
- Reduced the label noise by semi-supervision, achieving 1500bps lift in precision at the same level of recall on holdout set.
- Mentored machine learning intern projects and hosted brown-bag sessions to share novel machine learning techniques.
- Generates weekly metrics and reports to communicate model performance and progress with business stakeholders.
- Speech Recognition Using MFCC and LPC
- Used dataset which includes 6 alphabets with 3 different speakers to do automatic speech recognition based on MATLAB.
- Extracted MFCC and LPC coefficients from speech signals and used the coefficients as features for a Gaussian Mixtures Model, analyzed the relationship between the number of coefficients and accuracy.
- Analyzed the test data by mahalanobis distance to the model due to high autocorrelation
- A Research of Sensor Networks Based on Cellular Automation
- Optimized cellular automation algorithm and applied it in the sensor networks to save energy for sensor network, applied SDN (Software-Defined Network) for route control.
- Simulated the average energy change of the nodes with MATLAB and found the energy decreased of the nodes slower than that of previous models.
- Got the best research project award of the department on this topic.
Distributed File System Design • Collaborated with team members to implement a distributed system to support clients to operate files on servers concurrently, realized RPC on both client and server sides.
• Developed the two-layer system (including local cache and remote server) distributed system based on Java RMI.- The Achievement of Water Level Prediction Based on Machine Learning Algorithms
- Used Pearson correlation coefficient to design adaptive algorithms, reduce the weight of the data that is less relevant to the current prediction and reduce model complexity.
- Optimized and Approved the machine learning models (Multivariate Linear Model, Ridge Regression Model, SVM) based on scikit-learn to predict the water level.
- Arrived to the result that the optimized Multivariate Linear Model is the best model for the data and reached a very low error rate of 0.022%.
- Did the data visualization of the water level data prediction.
- Analysis STBC-MIMO System Under Rayleigh Channel
- Conduct research on STBC-MIMO system and Space-time block code, used Simulink to build the system model and did preliminary simulation of the system.
- Analyzed the bit error rate performance curve of the transmitting antenna under the independent Rayleigh channel, compare with the curve of the single transmitting-receiving antenna under the same conditions.
- Simulated using two transmitted antennas to transmit in order to get the spatial diversity gain, used one receiving antenna to receive signals.
- Arrived the result that the bit error rate has been significantly decreased after using space-time block codes.
