推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
CaptainD
V2EX  ›  Python

Python 数据处理程序内存异常

  •  
  •   CaptainD · Nov 9, 2021 · 2171 views
    This topic created in 1663 days ago, the information mentioned may be changed or developed.
    • 请教各位 V 友一个问题,本人需要增量处理一些大型的 XML 文件,从 python-cookbook 上找到了代码,我改到了我的场景下,但是代码似乎没有正常工作,内存占用上升很快,大约处理十几万行会占用几个 g 内存,我不太理解,希望大神指点,主要逻辑代码如下

    • macOS BigSur

    • python 3.8.12

    
    from xml.etree.ElementTree import iterparse
    def parse_and_remove(filename, path):
        path_parts = path.split('/')
        doc = iterparse(filename, ('start', 'end'))
        # Skip the root element
        next(doc)
        tag_stack = []
        elem_stack = []
        for event, elem in doc:
            if event == 'start':
                tag_stack.append(elem.tag)
                elem_stack.append(elem)
            elif event == 'end':
                if tag_stack == path_parts:
                    yield elem
                    elem_stack[-2].remove(elem)
                try:
                    tag_stack.pop()
                    elem_stack.pop()
                except IndexError:
                    pass
    
    data = parse_and_remove('my.xml','path')
    client, table = getMongo()
    
    for pothole in data:
        resDict = {
            # 获取我需要的数据
            } 
    
        table.insert(resDict)
    client.close()
    
    1 replies    2021-11-10 09:46:26 +08:00
    2i2Re2PLMaDnghL
        1
    2i2Re2PLMaDnghL  
       Nov 10, 2021
    1. 尝试换用 lxml
    2. 尝试用 xpath 而不是手动 iter 比对 path
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   2804 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 32ms · UTC 15:00 · PVG 23:00 · LAX 08:00 · JFK 11:00
    ♥ Do have faith in what you're doing.